As previous representations for reinforcement learning cannot effectively
incorporate a human-intuitive understanding of the 3D environment, they usually
suffer from sub-optimal performances. In this paper, we present Semantic-aware
Neural Radiance Fields for Reinforcement Learning (SNeRL), which jointly
optimizes semantic-aware neural radiance fields (NeRF) with a convolutional
encoder to learn 3D-aware neural implicit representation from multi-view
images. We introduce 3D semantic and distilled feature fields in parallel to
the RGB radiance fields in NeRF to learn semantic and object-centric
representation for reinforcement learning. SNeRL outperforms not only previous
pixel-based representations but also recent 3D-aware representations both in
model-free and model-based reinforcement learning.
( 2
min )
Correlation matrix visualization is essential for understanding the
relationships between variables in a dataset, but missing data can pose a
significant challenge in estimating correlation coefficients. In this paper, we
compare the effects of various missing data methods on the correlation plot,
focusing on two common missing patterns: random and monotone. We aim to provide
practical strategies and recommendations for researchers and practitioners in
creating and analyzing the correlation plot. Our experimental results suggest
that while imputation is commonly used for missing data, using imputed data for
plotting the correlation matrix may lead to a significantly misleading
inference of the relation between the features. We recommend using DPER, a
direct parameter estimation approach, for plotting the correlation matrix based
on its performance in the experiments.
( 2
min )
This paper focuses on solving a fault detection problem using multivariate
time series of vibration signals collected from planetary gearboxes in a test
rig. Various traditional machine learning and deep learning methods have been
proposed for multivariate time-series classification, including distance-based,
functional data-oriented, feature-driven, and convolution kernel-based methods.
Recent studies have shown using convolution kernel-based methods like ROCKET,
and 1D convolutional neural networks with ResNet and FCN, have robust
performance for multivariate time-series data classification. We propose an
ensemble of three convolution kernel-based methods and show its efficacy on
this fault detection problem by outperforming other approaches and achieving an
accuracy of more than 98.8\%.
( 2
min )
Likelihood-free inference methods typically make use of a distance between
simulated and real data. A common example is the maximum mean discrepancy
(MMD), which has previously been used for approximate Bayesian computation,
minimum distance estimation, generalised Bayesian inference, and within the
nonparametric learning framework. The MMD is commonly estimated at a root-$m$
rate, where $m$ is the number of simulated samples. This can lead to
significant computational challenges since a large $m$ is required to obtain an
accurate estimate, which is crucial for parameter estimation. In this paper, we
propose a novel estimator for the MMD with significantly improved sample
complexity. The estimator is particularly well suited for computationally
expensive smooth simulators with low- to mid-dimensional inputs. This claim is
supported through both theoretical results and an extensive simulation study on
benchmark simulators.
( 2
min )
In recent years neural networks have achieved impressive results on many
technological and scientific tasks. Yet, the mechanism through which these
models automatically select features, or patterns in data, for prediction
remains unclear. Identifying such a mechanism is key to advancing performance
and interpretability of neural networks and promoting reliable adoption of
these models in scientific applications. In this paper, we identify and
characterize the mechanism through which deep fully connected neural networks
learn features. We posit the Deep Neural Feature Ansatz, which states that
neural feature learning occurs by implementing the average gradient outer
product to up-weight features strongly related to model output. Our ansatz
sheds light on various deep learning phenomena including emergence of spurious
features and simplicity biases and how pruning networks can increase
performance, the "lottery ticket hypothesis." Moreover, the mechanism
identified in our work leads to a backpropagation-free method for feature
learning with any machine learning model. To demonstrate the effectiveness of
this feature learning mechanism, we use it to enable feature learning in
classical, non-feature learning models known as kernel machines and show that
the resulting models, which we refer to as Recursive Feature Machines, achieve
state-of-the-art performance on tabular data.
( 3
min )
In this paper we propose a new iterative algorithm to solve the fair PCA
(FPCA) problem. We start with the max-min fair PCA formulation originally
proposed in [1] and derive a simple and efficient iterative algorithm which is
based on the minorization-maximization (MM) approach. The proposed algorithm
relies on the relaxation of a semi-orthogonality constraint which is proved to
be tight at every iteration of the algorithm. The vanilla version of the
proposed algorithm requires solving a semi-definite program (SDP) at every
iteration, which can be further simplified to a quadratic program by
formulating the dual of the surrogate maximization problem. We also propose two
important reformulations of the fair PCA problem: a) fair robust PCA -- which
can handle outliers in the data, and b) fair sparse PCA -- which can enforce
sparsity on the estimated fair principal components. The proposed algorithms
are computationally efficient and monotonically increase their respective
design objectives at every iteration. An added feature of the proposed
algorithms is that they do not require the selection of any hyperparameter
(except for the fair sparse PCA case where a penalty parameter that controls
the sparsity has to be chosen by the user). We numerically compare the
performance of the proposed methods with two of the state-of-the-art approaches
on synthetic data sets and a real-life data set.
( 2
min )
We show the convergence of Wasserstein inverse reinforcement learning (WIRL)
for multi-objective optimizations with the projective subgradient method by
formulating an inverse problem of the optimization problem that is equivalent
to WIRL for multi-objective optimizations.
In addition, we prove convergence of inverse reinforcement learning (maximum
entropy inverse reinforcement learning, guid cost learning) for multi-objective
optimization with the projective subgradient method.
( 2
min )
We formulate a uniform tail bound for empirical processes indexed by a class
of functions, in terms of the individual deviations of the functions rather
than the worst-case deviation in the considered class. The tail bound is
established by introducing an initial "deflation" step to the standard generic
chaining argument. The resulting tail bound has a main complexity component, a
variant of Talagrand's $\gamma$ functional for the deflated function class, as
well as an instance-dependent deviation term, measured by an appropriately
scaled version of a suitable norm. Both of these terms are expressed using
certain coefficients formulated based on the relevant cumulant generating
functions. We also provide more explicit approximations for the mentioned
coefficients, when the function class lies in a given (exponential type) Orlicz
space.
( 2
min )
While machine learning is currently transforming the field of histopathology,
the domain lacks a comprehensive evaluation of state-of-the-art models based on
essential but complementary quality requirements beyond a mere classification
accuracy. In order to fill this gap, we developed a new methodology to
extensively evaluate a wide range of classification models, including recent
vision transformers, and convolutional neural networks such as: ConvNeXt,
ResNet (BiT), Inception, ViT and Swin transformer, with and without supervised
or self-supervised pretraining. We thoroughly tested the models on five widely
used histopathology datasets containing whole slide images of breast, gastric,
and colorectal cancer and developed a novel approach using an image-to-image
translation model to assess the robustness of a cancer classification model
against stain variations. Further, we extended existing interpretability
methods to previously unstudied models and systematically reveal insights of
the models' classifications strategies that can be transferred to future model
architectures.
( 2
min )
This paper carries out sparse-penalized deep neural networks predictors for
learning weakly dependent processes, with a broad class of loss functions. We
deal with a general framework that includes, regression estimation,
classification, times series prediction, $\cdots$ The $\psi$-weak dependence
structure is considered, and for the specific case of bounded observations,
$\theta_\infty$-coefficients are also used. In this case of
$\theta_\infty$-weakly dependent, a non asymptotic generalization bound within
the class of deep neural networks predictors is provided. For learning both
$\psi$ and $\theta_\infty$-weakly dependent processes, oracle inequalities for
the excess risk of the sparse-penalized deep neural networks estimators are
established. When the target function is sufficiently smooth, the convergence
rate of these excess risk is close to $\mathcal{O}(n^{-1/3})$. Some simulation
results are provided, and application to the forecast of the particulate matter
in the Vit\'{o}ria metropolitan area is also considered.
( 2
min )
Creative studio Elara Systems doesn’t shy away from sensitive subjects in its work.
( 6
min )
Artificial intelligence is teaming with crowdsourcing to improve mRNA vaccines’ thermostability — the ability to avoid breaking down under heat stress — making distribution more accessible worldwide. In this episode of the NVIDIA AI Podcast, host Noah Kravitz interviews Bojan Tunguz, a physicist and senior system software engineer, and Johnny Israeli, senior manager of AI Read article >
( 5
min )
You can now register machine learning (ML) models built in Amazon SageMaker Canvas with a single click to the Amazon SageMaker Model Registry, enabling you to operationalize ML models in production. Canvas is a visual interface that enables business analysts to generate accurate ML predictions on their own—without requiring any ML experience or having to […]
( 8
min )
Today, data scientists who are training deep learning models need to identify and remediate model training issues to meet accuracy targets for production deployment, and require a way to utilize standard tools for debugging model training. Among the data scientist community, TensorBoard is a popular toolkit that allows data scientists to visualize and analyze various […]
( 8
min )
Amazon SageMaker provides a broad selection of machine learning (ML) infrastructure and model deployment options to help meet your ML inference needs. It’s a fully-managed service and integrates with MLOps tools so you can work to scale your model deployment, reduce inference costs, manage models more effectively in production, and reduce operational burden. SageMaker provides […]
( 7
min )
This is a guest post co-written with Trey Robinson, CTO at Sleepme Inc. Sleepme is an industry leader in sleep temperature management and monitoring products, including an Internet of Things (IoT) enabled sleep tracking sensor suite equipped with heart rate, respiration rate, bed and ambient temperature, humidity, and pressure sensors. Sleepme offers a smart mattress […]
( 6
min )
Understanding business trends, customer behavior, sales revenue, increase in demand, and buyer propensity all start with data. Exploring, analyzing, interpreting, and finding trends in data is essential for businesses to achieve successful outcomes. Business analysts play a pivotal role in facilitating data-driven business decisions through activities such as the visualization of business metrics and the […]
( 10
min )
Project Jupyter is a multi-stakeholder, open-source project that builds applications, open standards, and tools for data science, machine learning (ML), and computational science. The Jupyter Notebook, first released in 2011, has become a de facto standard tool used by millions of users worldwide across every possible academic, research, and industry sector. Jupyter enables users to […]
( 8
min )
Jupyter notebooks are highly favored by data scientists for their ability to interactively process data, build ML models, and test these models by making inferences on data. However, there are scenarios in which data scientists may prefer to transition from interactive development on notebooks to batch jobs. Examples of such use cases include scaling up […]
( 9
min )
Citadel founder and CEO Ken Griffin visits MIT, discusses how technology will continue to transform trading and investing.
( 9
min )
Models trained using common data-collection techniques judge rule violations more harshly than humans would, researchers report.
( 9
min )
Matt Shoulders will lead an interdisciplinary team to improve RuBisCO — the photosynthesis enzyme thought to be the holy grail for improving agricultural yield.
( 11
min )
A new computer vision system turns any shiny object into a camera of sorts, enabling an observer to see around corners or beyond obstructions.
( 10
min )
Amazon SageMaker Serverless Inference allows you to serve model inference requests in real time without having to explicitly provision compute instances or configure scaling policies to handle traffic variations. You can let AWS handle the undifferentiated heavy lifting of managing the underlying infrastructure and save costs in the process. A Serverless Inference endpoint spins up […]
( 13
min )
Proteins drive many biological processes, such as enzyme activity, molecular transport, and cellular support. The three-dimensional structure of a protein provides insight into its function and how it interacts with other biomolecules. Experimental methods to determine protein structure, such as X-ray crystallography and NMR spectroscopy, are expensive and time-consuming. In contrast, recently-developed computational methods can […]
( 8
min )
Healthcare data is complex and siloed, and exists in various formats. An estimated 80% of data within organizations is considered to be unstructured or “dark” data that is locked inside text, emails, PDFs, and scanned documents. This data is difficult to interpret or analyze programmatically and limits how organizations can derive insights from it and […]
( 7
min )
Amazon SageMaker provides a number of options for users who are looking for a solution to host their machine learning (ML) models. Of these options, one of the key features that SageMaker provides is real-time inference. Real-time inference workloads can have varying levels of requirements and service level agreements (SLAs) in terms of latency and […]
( 15
min )
3D artist Milan Dey finds inspiration in games, movies, comics and pop culture. He drew from all of the above when creating a stunning 3D scene of Mayan ruins, The Hidden Temple of Itzamná, this week In the NVIDIA Studio.
( 7
min )
We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores for every neuron in GPT-2.
( 4
min )
Researchers identify a property that helps computer vision models learn to represent the visual world in a more stable, predictable way.
( 10
min )
With Amazon SageMaker, you can manage the whole end-to-end machine learning (ML) lifecycle. It offers many native capabilities to help manage ML workflows aspects, such as experiment tracking, and model governance via the model registry. This post provides a solution tailored to customers that are already using MLflow, an open-source platform for managing ML workflows. […]
( 15
min )
Sometimes it can be very beneficial to use tools such as compilers that can modify and compile your models for optimal inference performance. In this post, we explore TensorRT and how to use it with Amazon SageMaker inference using NVIDIA Triton Inference Server. We explore how TensorRT works and how to host and optimize these […]
( 15
min )
Warning: very long, 2-part blog series. But this topic is too important to not carefully explain how we can educate and empower everyone to participate in the AI conversation. Our success as a society depends upon our ability to include everyone in this conversation. “I love it when a plan comes together” – Hannibal Smith,… Read More »AI for Everyone: Learn How to Think Like a Data Scientist – Part 1
The post AI for Everyone: Learn How to Think Like a Data Scientist – Part 1 appeared first on Data Science Central.
( 22
min )
Machine learning in Healthcare industry.
The post How Machine Learning is Revolutionizing the Healthcare Industry appeared first on Data Science Central.
( 21
min )
Safety has been a critical issue for the deployment of learning-based
approaches in real-world applications. To address this issue, control barrier
function (CBF) and its variants have attracted extensive attention for
safety-critical control. However, due to the myopic one-step nature of CBF and
the lack of principled methods to design the class-$\mathcal{K}$ functions,
there are still fundamental limitations of current CBFs: optimality, stability,
and feasibility. In this paper, we proposed a novel and unified approach to
address these limitations with Adaptive Multi-step Control Barrier Function
(AM-CBF), where we parameterize the class-$\mathcal{K}$ function by a neural
network and train it together with the reinforcement learning policy. Moreover,
to mitigate the myopic nature, we propose a novel \textit{multi-step training
and single-step execution} paradigm to make CBF farsighted while the execution
remains solving a single-step convex quadratic program. Our method is evaluated
on the first and second-order systems in various scenarios, where our approach
outperforms the conventional CBF both qualitatively and quantitatively.
( 2
min )
Small molecules in biological samples are studied to provide information
about disease states, environmental toxins, natural product drug discovery, and
many other applications. The primary window into the composition of small
molecule mixtures is tandem mass spectrometry (MS2), which produces data that
are of high sensitivity and part per million resolution. We adopt multi-scale
sinusoidal embeddings of the mass data in MS2 designed to meet the challenge of
learning from the full resolution of MS2 data. Using these embeddings, we
provide a new state of the art model for spectral library search, the standard
task for initial evaluation of MS2 data. We also introduce a new task, chemical
property prediction from MS2 data, that has natural applications in
high-throughput MS2 experiments and show that an average $R^2$ of 80\% for
novel compounds can be achieved across 10 chemical properties prioritized by
medicinal chemists. We use dimensionality reduction techniques and experiments
with different floating point resolutions to show the essential role
multi-scale sinusoidal embeddings play in learning from MS2 data.
( 2
min )
Motivated by the recent success of Machine Learning tools in wireless
communications, the idea of semantic communication by Weaver from 1949 has
gained attention. It breaks with Shannon's classic design paradigm by aiming to
transmit the meaning, i.e., semantics, of a message instead of its exact
version, allowing for information rate savings. In this work, we apply the
Stochastic Policy Gradient (SPG) to design a semantic communication system by
reinforcement learning, not requiring a known or differentiable channel model -
a crucial step towards deployment in practice. Further, we motivate the use of
SPG for both classic and semantic communication from the maximization of the
mutual information between received and target variables. Numerical results
show that our approach achieves comparable performance to a model-aware
approach based on the reparametrization trick, albeit with a decreased
convergence rate.
( 2
min )
This paper considers the problem of helping humans exercise scalable
oversight over deep neural networks (DNNs). Adversarial examples can be useful
by helping to reveal weaknesses in DNNs, but they can be difficult to interpret
or draw actionable conclusions from. Some previous works have proposed using
human-interpretable adversarial attacks including copy/paste attacks in which
one natural image pasted into another causes an unexpected misclassification.
We build on these with two contributions. First, we introduce Search for
Natural Adversarial Features Using Embeddings (SNAFUE) which offers a fully
automated method for finding copy/paste attacks. Second, we use SNAFUE to red
team an ImageNet classifier. We reproduce copy/paste attacks from previous
works and find hundreds of other easily-describable vulnerabilities, all
without a human in the loop. Code is available at
https://github.com/thestephencasper/snafue
( 2
min )
We propose a novel graph-regularized neural network (GRNN) algorithm for tree
species classification. The proposed algorithm encompasses superpixel-based
segmentation for graph construction, a pixel-wise neural network classifier,
and the label propagation technique to generate an accurate and realistic
(emulating tree crowns) classification map on a sparsely annotated data set.
GRNN outperforms several state-of-the-art techniques not only for the standard
Indian Pines HSI but also achieves a high classification accuracy (approx. 92%)
on a new HSI data set collected over the heterogeneous forests of French Guiana
(FG) when less than 1% of the pixels are labeled. We further show that GRNN is
competitive with the state-of-the-art semi-supervised methods and exhibits a
small deviation in accuracy for different numbers of training samples and over
repeated trials with randomly sampled labeled pixels for training.
( 2
min )
Bayesian hierarchical mixture clustering (BHMC) improves traditionalBayesian
hierarchical clustering by replacing conventional Gaussian-to-Gaussian kernels
with a Hierarchical Dirichlet Process Mixture Model(HDPMM) for parent-to-child
diffusion in the generative process. However,BHMC may produce trees with high
nodal variance, indicating weak separation between nodes at higher levels. To
address this issue, we employ Posterior Regularization, which imposes
max-margin constraints on nodes at every level to enhance cluster separation.
We illustrate how to apply PR toBHMC and demonstrate its effectiveness in
improving the BHMC model.
( 2
min )
Strong demand for autonomous vehicles and the wide availability of 3D sensors
are continuously fueling the proposal of novel methods for 3D object detection.
In this paper, we provide a comprehensive survey of recent developments from
2012-2021 in 3D object detection covering the full pipeline from input data,
over data representation and feature extraction to the actual detection
modules. We introduce fundamental concepts, focus on a broad range of different
approaches that have emerged over the past decade, and propose a
systematization that provides a practical framework for comparing these
approaches with the goal of guiding future development, evaluation and
application activities. Specifically, our survey and systematization of 3D
object detection models and methods can help researchers and practitioners to
get a quick overview of the field by decomposing 3DOD solutions into more
manageable pieces.
( 2
min )
Photoplethysmogram (PPG) signals are easily contaminated by motion artifacts
in real-world settings, despite their widespread use in Internet-of-Things
(IoT) based wearable and smart health devices for cardiovascular health
monitoring. This study proposed a lightweight deep neural network, called
Tiny-PPG, for accurate and real-time PPG artifact segmentation on IoT edge
devices. The model was trained and tested on a public dataset, PPG DaLiA, which
featured complex artifacts with diverse lengths and morphologies during various
daily activities of 15 subjects using a watch-type device (Empatica E4). The
model structure, training method and loss function were specifically designed
to balance detection accuracy and speed for real-time PPG artifact detection in
resource-constrained embedded devices. To optimize the model size and
capability in multi-scale feature representation, the model employed deep
separable convolution and atrous spatial pyramid pooling modules, respectively.
Additionally, the contrastive loss was also utilized to further optimize the
feature embeddings. With additional model pruning, Tiny-PPG achieved
state-of-the-art detection accuracy of 87.8% while only having 19,726 model
parameters (0.15 megabytes), and was successfully deployed on an STM32 embedded
system for real-time PPG artifact detection. Therefore, this study provides an
effective solution for resource-constraint IoT smart health devices in PPG
artifact detection.
( 2
min )
Natural language generation from structured data mainly focuses on
surface-level descriptions, suffering from uncontrollable content selection and
low fidelity. Previous works leverage logical forms to facilitate logical
knowledge-conditioned text generation. Though achieving remarkable progress,
they are data-hungry, which makes the adoption for real-world applications
challenging with limited data. To this end, this paper proposes a unified
framework for logical knowledge-conditioned text generation in the few-shot
setting. With only a few seeds logical forms (e.g., 20/100 shot), our approach
leverages self-training and samples pseudo logical forms based on content and
structure consistency. Experimental results demonstrate that our approach can
obtain better few-shot performance than baselines.
( 2
min )
Achieving resource efficiency while preserving end-user experience is
non-trivial for cloud application operators. As cloud applications
progressively adopt microservices, resource managers are faced with two
distinct levels of system behavior: the end-to-end application latency and
per-service resource usage. Translation between these two levels, however, is
challenging because user requests traverse heterogeneous services that
collectively (but unevenly) contribute to the end-to-end latency. This paper
presents Autothrottle, a bi-level learning-assisted resource management
framework for SLO-targeted microservices. It architecturally decouples
mechanisms of application SLO feedback and service resource control, and
bridges them with the notion of performance targets. This decoupling enables
targeted control policies for these two mechanisms, where we combine
lightweight heuristics and learning techniques. We evaluate Autothrottle on
three microservice applications, with workload traces from production
scenarios. Results show its superior CPU resource saving, up to 26.21% over the
best-performing baseline, and up to 93.84% over all baselines.
( 2
min )
This document presents some early explorations of applying Softly Masked
Language Modelling (SMLM) to symbolic music generation. SMLM can be seen as a
generalisation of masked language modelling (MLM), where instead of each
element of the input set being either known or unknown, elements can be partly
known. We demonstrate some results of applying SMLM to constrained symbolic
music generation using a transformer encoder architecture. Several audio
examples are available at https://erl-j.github.io/smlm-web-supplement/
( 2
min )
ChatGPT is another large language model (LLM) inline but due to its
performance and ability to converse effectively, it has gained a huge
popularity amongst research as well as industrial community. Recently, many
studies have been published to show the effectiveness, efficiency, integration,
and sentiments of chatGPT and other LLMs. In contrast, this study focuses on
the important aspects that are mostly overlooked, i.e. sustainability, privacy,
digital divide, and ethics and suggests that not only chatGPT but every
subsequent entry in the category of conversational bots should undergo
Sustainability, PrivAcy, Digital divide, and Ethics (SPADE) evaluation. This
paper discusses in detail about the issues and concerns raised over chatGPT in
line with aforementioned characteristics. We support our hypothesis by some
preliminary data collection and visualizations along with hypothesized facts.
We also suggest mitigations and recommendations for each of the concerns.
Furthermore, we also suggest some policies and recommendations for AI policy
act, if designed by the governments.
( 2
min )
In recent years, multitudes of researches have applied deep learning to
automatic sleep stage classification. Whereas actually, these works have paid
less attention to the issue of cross-subject in sleep staging. At the same
time, emerging neuroscience theories on inter-subject correlations can provide
new insights for cross-subject analysis. This paper presents the MViTime model
that have been used in sleep staging study. And we implement the inter-subject
correlation theory through contrastive learning, providing a feasible solution
to address the cross-subject problem in sleep stage classification. Finally,
experimental results and conclusions are presented, demonstrating that the
developed method has achieved state-of-the-art performance on sleep staging.
The results of the ablation experiment also demonstrate the effectiveness of
the cross-subject approach based on contrastive learning.
( 2
min )
We use explainable neural networks to connect the evolutionary history of
dark matter halos with their density profiles. The network captures independent
factors of variation in the density profiles within a low-dimensional
representation, which we physically interpret using mutual information. Without
any prior knowledge of the halos' evolution, the network recovers the known
relation between the early time assembly and the inner profile, and discovers
that the profile beyond the virial radius is described by a single parameter
capturing the most recent mass accretion rate. The results illustrate the
potential for machine-assisted scientific discovery in complicated
astrophysical datasets.
( 2
min )
Selecting a minimal feature set that is maximally informative about a target
variable is a central task in machine learning and statistics. Information
theory provides a powerful framework for formulating feature selection
algorithms -- yet, a rigorous, information-theoretic definition of feature
relevancy, which accounts for feature interactions such as redundant and
synergistic contributions, is still missing. We argue that this lack is
inherent to classical information theory which does not provide measures to
decompose the information a set of variables provides about a target into
unique, redundant, and synergistic contributions. Such a decomposition has been
introduced only recently by the partial information decomposition (PID)
framework. Using PID, we clarify why feature selection is a conceptually
difficult problem when approached using information theory and provide a novel
definition of feature relevancy and redundancy in PID terms. From this
definition, we show that the conditional mutual information (CMI) maximizes
relevancy while minimizing redundancy and propose an iterative, CMI-based
algorithm for practical feature selection. We demonstrate the power of our
CMI-based algorithm in comparison to the unconditional mutual information on
benchmark examples and provide corresponding PID estimates to highlight how PID
allows to quantify information contribution of features and their interactions
in feature-selection problems.
( 3
min )
Detecting plagiarism involves finding similar items in two different sources.
In this article, we propose a novel method for detecting plagiarism that is
based on attention mechanism-based long short-term memory (LSTM) and
bidirectional encoder representations from transformers (BERT) word embedding,
enhanced with optimized differential evolution (DE) method for pre-training and
a focal loss function for training. BERT could be included in a downstream task
and fine-tuned as a task-specific BERT can be included in a downstream task and
fine-tuned as a task-specific structure, while the trained BERT model is
capable of detecting various linguistic characteristics. Unbalanced
classification is one of the primary issues with plagiarism detection. We
suggest a focal loss-based training technique that carefully learns minority
class instances to solve this. Another issue that we tackle is the training
phase itself, which typically employs gradient-based methods like
back-propagation for the learning process and thus suffers from some drawbacks,
including sensitivity to initialization. To initiate the BP process, we suggest
a novel DE algorithm that makes use of a clustering-based mutation operator.
Here, a winning cluster is identified for the current DE population, and a
fresh updating method is used to produce potential answers. We evaluate our
proposed approach on three benchmark datasets ( MSRP, SNLI, and SemEval2014)
and demonstrate that it performs well when compared to both conventional and
population-based methods.
( 3
min )
This work investigates pretrained audio representations for few shot Sound
Event Detection. We specifically address the task of few shot detection of
novel acoustic sequences, or sound events with semantically meaningful temporal
structure, without assuming access to non-target audio. We develop procedures
for pretraining suitable representations, and methods which transfer them to
our few shot learning scenario. Our experiments evaluate the general purpose
utility of our pretrained representations on AudioSet, and the utility of
proposed few shot methods via tasks constructed from real-world acoustic
sequences. Our pretrained embeddings are suitable to the proposed task, and
enable multiple aspects of our few shot framework.
( 2
min )
Recent years have witnessed the proliferation of traffic accidents, which led
wide researches on Automated Vehicle (AV) technologies to reduce vehicle
accidents, especially on risk assessment framework of AV technologies. However,
existing time-based frameworks can not handle complex traffic scenarios and
ignore the motion tendency influence of each moving objects on the risk
distribution, leading to performance degradation. To address this problem, we
novelly propose a comprehensive driving risk management framework named RCP-RF
based on potential field theory under Connected and Automated Vehicles (CAV)
environment, where the pedestrian risk metric are combined into a unified
road-vehicle driving risk management framework. Different from existing
algorithms, the motion tendency between ego and obstacle cars and the
pedestrian factor are legitimately considered in the proposed framework, which
can improve the performance of the driving risk model. Moreover, it requires
only O(N 2) of time complexity in the proposed method. Empirical studies
validate the superiority of our proposed framework against state-of-the-art
methods on real-world dataset NGSIM and real AV platform.
( 2
min )
We study principal component analysis (PCA), where given a dataset in
$\mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ that
approximately maximizes the variance of the distribution after being projected
along $v$. Despite being a classical task, standard estimators fail drastically
if the data contains even a small fraction of outliers, motivating the problem
of robust PCA. Recent work has developed computationally-efficient algorithms
for robust PCA that either take super-linear time or have sub-optimal error
guarantees. Our main contribution is to develop a nearly-linear time algorithm
for robust PCA with near-optimal error guarantees. We also develop a
single-pass streaming algorithm for robust PCA with memory usage nearly-linear
in the dimension.
( 2
min )
Analysis of Electrochemical Impedance Spectroscopy (EIS) data for
electrochemical systems often consists of defining an Equivalent Circuit Model
(ECM) using expert knowledge and then optimizing the model parameters to
deconvolute various resistance, capacitive, inductive, or diffusion responses.
For small data sets, this procedure can be conducted manually; however, it is
not feasible to manually define a proper ECM for extensive data sets with a
wide range of EIS responses. Automatic identification of an ECM would
substantially accelerate the analysis of large sets of EIS data. We showcase
machine learning methods to classify the ECMs of 9,300 impedance spectra
provided by QuantumScape for the BatteryDEV hackathon. The best-performing
approach is a gradient-boosted tree model utilizing a library to automatically
generate features, followed by a random forest model using the raw spectral
data. A convolutional neural network using boolean images of Nyquist
representations is presented as an alternative, although it achieves a lower
accuracy. We publish the data and open source the associated code. The
approaches described in this article can serve as benchmarks for further
studies. A key remaining challenge is the identifiability of the labels,
underlined by the model performances and the comparison of misclassified
spectra.
( 3
min )
We provide a psychometric-grounded exposition of bias and fairness as applied
to a typical machine learning pipeline for affective computing. We expand on an
interpersonal communication framework to elucidate how to identify sources of
bias that may arise in the process of inferring human emotions and other
psychological constructs from observed behavior. Various methods and metrics
for measuring fairness and bias are discussed along with pertinent implications
within the United States legal context. We illustrate how to measure some types
of bias and fairness in a case study involving automatic personality and
hireability inference from multimodal data collected in video interviews for
mock job applications. We encourage affective computing researchers and
practitioners to encapsulate bias and fairness in their research processes and
products and to consider their role, agency, and responsibility in promoting
equitable and just systems.
( 2
min )
This paper evaluates the viability of using fixed language models for
training text classification networks on low-end hardware. We combine language
models with a CNN architecture and put together a comprehensive benchmark with
8 datasets covering single-label and multi-label classification of topic,
sentiment, and genre. Our observations are distilled into a list of trade-offs,
concluding that there are scenarios, where not fine-tuning a language model
yields competitive effectiveness at faster training, requiring only a quarter
of the memory compared to fine-tuning.
( 2
min )
In this work, we propose a novel evolutionary algorithm for neural
architecture search, applicable to global search spaces. The algorithm's
architectural representation organizes the topology in multiple hierarchical
modules, while the design process exploits this representation, in order to
explore the search space. We also employ a curation system, which promotes the
utilization of well performing sub-structures to subsequent generations. We
apply our method to Fashion-MNIST and NAS-Bench101, achieving accuracies of
$93.2\%$ and $94.8\%$ respectively in a relatively small number of generations.
( 2
min )
Many machine learning (ML) libraries are accessible online for ML
practitioners. Typical ML pipelines are complex and consist of a series of
steps, each of them invoking several ML libraries. In this demo paper, we
present ExeKGLib, a Python library that allows users with coding skills and
minimal ML knowledge to build ML pipelines. ExeKGLib relies on knowledge graphs
to improve the transparency and reusability of the built ML workflows, and to
ensure that they are executable. We demonstrate the usage of ExeKGLib and
compare it with conventional ML code to show its benefits.
( 2
min )
Numerical models are used widely for parameter reconstructions in the field
of optical nano metrology. To obtain geometrical parameters of a nano
structured line grating, we fit a finite element numerical model to an
experimental data set by using the Bayesian target vector optimization method.
Gaussian process surrogate models are trained during the reconstruction.
Afterwards, we employ a Markov chain Monte Carlo sampler on the surrogate
models to determine the full model parameter distribution for the reconstructed
model parameters. The choice of numerical discretization parameters, like the
polynomial order of the finite element ansatz functions, impacts the numerical
discretization error of the forward model. In this study we investigate the
impact of numerical discretization parameters of the forward problem on the
reconstructed parameters as well as on the model parameter distributions. We
show that such a convergence study allows to determine numerical parameters
which allow for efficient and accurate reconstruction results.
( 2
min )
We describe how interpretable boosting algorithms based on ridge-regularized
generalized linear models can be used to analyze high-dimensional environmental
data. We illustrate this by using environmental, social, human and biophysical
data to predict the financial vulnerability of farmers in Chile and Tunisia
against climate hazards. We show how group structures can be considered and how
interactions can be found in high-dimensional datasets using a novel 2-step
boosting approach. The advantages and efficacy of the proposed method are shown
and discussed. Results indicate that the presence of interaction effects only
improves predictive power when included in two-step boosting. The most
important variable in predicting all types of vulnerabilities are natural
assets. Other important variables are the type of irrigation, economic assets
and the presence of crop damage of near farms.
( 2
min )
This paper formulates a general cross validation framework for signal
denoising. The general framework is then applied to nonparametric regression
methods such as Trend Filtering and Dyadic CART. The resulting cross validated
versions are then shown to attain nearly the same rates of convergence as are
known for the optimally tuned analogues. There did not exist any previous
theoretical analyses of cross validated versions of Trend Filtering or Dyadic
CART. To illustrate the generality of the framework we also propose and study
cross validated versions of two fundamental estimators; lasso for high
dimensional linear regression and singular value thresholding for matrix
estimation. Our general framework is inspired by the ideas in Chatterjee and
Jafarov (2015) and is potentially applicable to a wide range of estimation
methods which use tuning parameters.
( 2
min )
This paper provide several mathematical analyses of the diffusion model in
machine learning. The drift term of the backwards sampling process is
represented as a conditional expectation involving the data distribution and
the forward diffusion. The training process aims to find such a drift function
by minimizing the mean-squared residue related to the conditional expectation.
Using small-time approximations of the Green's function of the forward
diffusion, we show that the analytical mean drift function in DDPM and the
score function in SGM asymptotically blow up in the final stages of the
sampling process for singular data distributions such as those concentrated on
lower-dimensional manifolds, and is therefore difficult to approximate by a
network. To overcome this difficulty, we derive a new target function and
associated loss, which remains bounded even for singular data distributions. We
illustrate the theoretical findings with several numerical examples.
( 2
min )
We study principal component analysis (PCA), where given a dataset in
$\mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ that
approximately maximizes the variance of the distribution after being projected
along $v$. Despite being a classical task, standard estimators fail drastically
if the data contains even a small fraction of outliers, motivating the problem
of robust PCA. Recent work has developed computationally-efficient algorithms
for robust PCA that either take super-linear time or have sub-optimal error
guarantees. Our main contribution is to develop a nearly-linear time algorithm
for robust PCA with near-optimal error guarantees. We also develop a
single-pass streaming algorithm for robust PCA with memory usage nearly-linear
in the dimension.
( 2
min )
Clustering is at the very core of machine learning, and its applications
proliferate with the increasing availability of data. However, as datasets
grow, comparing clusterings with an adjustment for chance becomes
computationally difficult, preventing unbiased ground-truth comparisons and
solution selection. We propose FastAMI, a Monte Carlo-based method to
efficiently approximate the Adjusted Mutual Information (AMI) and extend it to
the Standardized Mutual Information (SMI). The approach is compared with the
exact calculation and a recently developed variant of the AMI based on pairwise
permutations, using both synthetic and real data. In contrast to the exact
calculation our method is fast enough to enable these adjusted
information-theoretic comparisons for large datasets while maintaining
considerably more accurate results than the pairwise approach.
( 2
min )
The analysis of large-scale time-series network data, such as social media
and email communications, remains a significant challenge for graph analysis
methodology. In particular, the scalability of graph analysis is a critical
issue hindering further progress in large-scale downstream inference. In this
paper, we introduce a novel approach called "temporal encoder embedding" that
can efficiently embed large amounts of graph data with linear complexity. We
apply this method to an anonymized time-series communication network from a
large organization spanning 2019-2020, consisting of over 100 thousand vertices
and 80 million edges. Our method embeds the data within 10 seconds on a
standard computer and enables the detection of communication pattern shifts for
individual vertices, vertex communities, and the overall graph structure.
Through supporting theory and synthesis studies, we demonstrate the theoretical
soundness of our approach under random graph models and its numerical
effectiveness through simulation studies.
( 2
min )
Jeff Wilke SM '93, former CEO of Amazon’s Worldwide Consumer business, brings his LGO playbook to his new mission of revitalizing manufacturing in the U.S.
( 12
min )
In this post, we discuss a machine learning (ML) solution for complex image searches using Amazon Kendra and Amazon Rekognition. Specifically, we use the example of architecture diagrams for complex images due to their incorporation of numerous different visual icons and text. With the internet, searching and obtaining an image has never been easier. Most […]
( 17
min )
This is a joint post co-written by AWS and Voxel51. Voxel51 is the company behind FiftyOne, the open-source toolkit for building high-quality datasets and computer vision models. A retail company is building a mobile app to help customers buy clothes. To create this app, they need a high-quality dataset containing clothing images, labeled with different […]
( 16
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Automated revenue cycle management (RCM) is becoming an increasingly vital component in the healthcare industry, streamlining and accurately processing complex billing tasks. Using artificial intelligence (AI) and machine learning capabilities will help in ensuring accuracy, minimize human error, and free up personnel to focus on more important tasks. In addition, utilizing AI for RCM will… Read More »Maximizing Revenue in Psychology Practices: Leveraging AI for Billing Optimization
The post Maximizing Revenue in Psychology Practices: Leveraging AI for Billing Optimization appeared first on Data Science Central.
( 21
min )
The world of artificial intelligence (AI) and machine learning (ML) has been witnessing a paradigm shift with the rise of generative AI models that can create human-like text, images, code, and audio. Compared to classical ML models, generative AI models are significantly bigger and more complex. However, their increasing complexity also comes with high costs […]
( 12
min )
Time series forecasting refers to the process of predicting future values of time series data (data that is collected at regular intervals over time). Simple methods for time series forecasting use historical values of the same variable whose future values need to be predicted, whereas more complex, machine learning (ML)-based methods use additional information, such […]
( 16
min )
Generative AI is gaining a lot of public attention at present, with talk around products such as GPT4, ChatGPT, DALL-E2, Bard, and many other AI technologies. Many customers have been asking for more information on AWS’s generative AI solutions. The aim of this post is to address those needs. This post provides an overview of […]
( 10
min )
Diffusion models have been used to generate photorealistic images and short videos, compose music, and synthesize speech. In a new paper, Microsoft Researchers explore how they can be used to imitate human behavior in interactive environments.
The post Using generative AI to imitate human behavior appeared first on Microsoft Research.
( 11
min )
Kris Kersey is an embedded software developer with over 20 years of experience, an educational YouTuber with 30,000+ subscribers, and a lifelong lover of comics and cosplay. These interests and expertise came together in his first-ever project using the NVIDIA Jetson platform for edge AI and robotics when he created a fully functional superhero helmet Read article >
( 6
min )
What has it got in its pocketses? More games coming in May, that’s what. GFN Thursday gets the summer started early with two newly supported games this week and 16 more coming later this month — including The Lord of the Rings: Gollum. Don’t forget to take advantage of the limited-time discount on six-month Priority Read article >
( 6
min )
The system they developed eliminates a source of bias in simulations, leading to improved algorithms that can boost the performance of applications.
( 9
min )
eXplainable artificial intelligence (XAI) methods have emerged to convert the
black box of machine learning models into a more digestible form. These methods
help to communicate how the model works with the aim of making machine learning
models more transparent and increasing the trust of end-users into their
output. SHapley Additive exPlanations (SHAP) and Local Interpretable Model
Agnostic Explanation (LIME) are two widely used XAI methods particularly with
tabular data. In this commentary piece, we discuss the way the explainability
metrics of these two methods are generated and propose a framework for
interpretation of their outputs, highlighting their weaknesses and strengths.
( 2
min )
This paper describes our submission to the MEDIQA-Chat 2023 shared task for
automatic clinical note generation from doctor-patient conversations. We report
results for two approaches: the first fine-tunes a pre-trained language model
(PLM) on the shared task data, and the second uses few-shot in-context learning
(ICL) with a large language model (LLM). Both achieve high performance as
measured by automatic metrics (e.g. ROUGE, BERTScore) and ranked second and
first, respectively, of all submissions to the shared task. Expert human
scrutiny indicates that notes generated via the ICL-based approach with GPT-4
are preferred about as often as human-written notes, making it a promising path
toward automated note generation from doctor-patient conversations.
( 2
min )
Contrastively trained encoders have recently been proven to invert the
data-generating process: they encode each input, e.g., an image, into the true
latent vector that generated the image (Zimmermann et al., 2021). However,
real-world observations often have inherent ambiguities. For instance, images
may be blurred or only show a 2D view of a 3D object, so multiple latents could
have generated them. This makes the true posterior for the latent vector
probabilistic with heteroscedastic uncertainty. In this setup, we extend the
common InfoNCE objective and encoders to predict latent distributions instead
of points. We prove that these distributions recover the correct posteriors of
the data-generating process, including its level of aleatoric uncertainty, up
to a rotation of the latent space. In addition to providing calibrated
uncertainty estimates, these posteriors allow the computation of credible
intervals in image retrieval. They comprise images with the same latent as a
given query, subject to its uncertainty. Code is available at
https://github.com/mkirchhof/Probabilistic_Contrastive_Learning
( 2
min )
Differentiable particle filters are an emerging class of particle filtering
methods that use neural networks to construct and learn parametric state-space
models. In real-world applications, both the state dynamics and measurements
can switch between a set of candidate models. For instance, in target tracking,
vehicles can idle, move through traffic, or cruise on motorways, and
measurements are collected in different geographical or weather conditions.
This paper proposes a new differentiable particle filter for regime-switching
state-space models. The method can learn a set of unknown candidate dynamic and
measurement models and track the state posteriors. We evaluate the performance
of the novel algorithm in relevant models, showing its great performance
compared to other competitive algorithms.
( 2
min )
We present a new method for functional tissue unit segmentation at the
cellular level, which utilizes the latest deep learning semantic segmentation
approaches together with domain adaptation and semi-supervised learning
techniques. This approach allows for minimizing the domain gap, class
imbalance, and captures settings influence between HPA and HubMAP datasets. The
presented approach achieves comparable with state-of-the-art-result in
functional tissue unit segmentation at the cellular level. The source code is
available at https://github.com/VSydorskyy/hubmap_2022_htt_solution
( 2
min )
Prior research has investigated the impact of various linguistic features on
cross-lingual transfer performance. In this study, we investigate the manner in
which this effect can be mapped onto the representation space. While past
studies have focused on the impact on cross-lingual alignment in multilingual
language models during fine-tuning, this study examines the absolute evolution
of the respective language representation spaces produced by MLLMs. We place a
specific emphasis on the role of linguistic characteristics and investigate
their inter-correlation with the impact on representation spaces and
cross-lingual transfer performance. Additionally, this paper provides
preliminary evidence of how these findings can be leveraged to enhance transfer
to linguistically distant languages.
( 2
min )
We present a study using new computational methods, based on a novel
combination of machine learning for inferring admixture hidden Markov models
and probabilistic model checking, to uncover interaction styles in a mobile
app. These styles are then used to inform a redesign, which is implemented,
deployed, and then analysed using the same methods. The data sets are logged
user traces, collected over two six-month deployments of each version,
involving thousands of users and segmented into different time intervals. The
methods do not assume tasks or absolute metrics such as measures of engagement,
but uncover the styles through unsupervised inference of clusters and analysis
with probabilistic temporal logic. For both versions there was a clear
distinction between the styles adopted by users during the first day/week/month
of usage, and during the second and third months, a result we had not
anticipated.
( 2
min )
Respiratory syncytial virus (RSV) is one of the most dangerous respiratory
diseases for infants and young children. Due to the nonpharmaceutical
intervention (NPI) imposed in the COVID-19 outbreak, the seasonal transmission
pattern of RSV has been discontinued in 2020 and then shifted months ahead in
2021 in the northern hemisphere. It is critical to understand how COVID-19
impacts RSV and build predictive algorithms to forecast the timing and
intensity of RSV reemergence in post-COVID-19 seasons. In this paper, we
propose a deep coupled tensor factorization machine, dubbed as DeCom, for post
COVID-19 RSV prediction. DeCom leverages tensor factorization and residual
modeling. It enables us to learn the disrupted RSV transmission reliably under
COVID-19 by taking both the regular seasonal RSV transmission pattern and the
NPI into consideration. Experimental results on a real RSV dataset show that
DeCom is more accurate than the state-of-the-art RSV prediction algorithms and
achieves up to 46% lower root mean square error and 49% lower mean absolute
error for country-level prediction compared to the baselines.
( 2
min )
In an effort to address the training instabilities of GANs, we introduce a
class of dual-objective GANs with different value functions (objectives) for
the generator (G) and discriminator (D). In particular, we model each objective
using $\alpha$-loss, a tunable classification loss, to obtain
$(\alpha_D,\alpha_G)$-GANs, parameterized by $(\alpha_D,\alpha_G)\in
(0,\infty]^2$. For sufficiently large number of samples and capacities for G
and D, we show that the resulting non-zero sum game simplifies to minimizing an
$f$-divergence under appropriate conditions on $(\alpha_D,\alpha_G)$. In the
finite sample and capacity setting, we define estimation error to quantify the
gap in the generator's performance relative to the optimal setting with
infinite samples and obtain upper bounds on this error, showing it to be order
optimal under certain conditions. Finally, we highlight the value of tuning
$(\alpha_D,\alpha_G)$ in alleviating training instabilities for the synthetic
2D Gaussian mixture ring and the Stacked MNIST datasets.
( 2
min )
An old problem in multivariate statistics is that linear Gaussian models are
often unidentifiable, i.e. some parameters cannot be uniquely estimated. In
factor (component) analysis, an orthogonal rotation of the factors is
unidentifiable, while in linear regression, the direction of effect cannot be
identified. For such linear models, non-Gaussianity of the (latent) variables
has been shown to provide identifiability. In the case of factor analysis, this
leads to independent component analysis, while in the case of the direction of
effect, non-Gaussian versions of structural equation modelling solve the
problem. More recently, we have shown how even general nonparametric nonlinear
versions of such models can be estimated. Non-Gaussianity is not enough in this
case, but assuming we have time series, or that the distributions are suitably
modulated by some observed auxiliary variables, the models are identifiable.
This paper reviews the identifiability theory for the linear and nonlinear
cases, considering both factor analytic models and structural equation models.
( 2
min )
Contrastively trained encoders have recently been proven to invert the
data-generating process: they encode each input, e.g., an image, into the true
latent vector that generated the image (Zimmermann et al., 2021). However,
real-world observations often have inherent ambiguities. For instance, images
may be blurred or only show a 2D view of a 3D object, so multiple latents could
have generated them. This makes the true posterior for the latent vector
probabilistic with heteroscedastic uncertainty. In this setup, we extend the
common InfoNCE objective and encoders to predict latent distributions instead
of points. We prove that these distributions recover the correct posteriors of
the data-generating process, including its level of aleatoric uncertainty, up
to a rotation of the latent space. In addition to providing calibrated
uncertainty estimates, these posteriors allow the computation of credible
intervals in image retrieval. They comprise images with the same latent as a
given query, subject to its uncertainty. Code is available at
https://github.com/mkirchhof/Probabilistic_Contrastive_Learning
( 2
min )
eXplainable artificial intelligence (XAI) methods have emerged to convert the
black box of machine learning models into a more digestible form. These methods
help to communicate how the model works with the aim of making machine learning
models more transparent and increasing the trust of end-users into their
output. SHapley Additive exPlanations (SHAP) and Local Interpretable Model
Agnostic Explanation (LIME) are two widely used XAI methods particularly with
tabular data. In this commentary piece, we discuss the way the explainability
metrics of these two methods are generated and propose a framework for
interpretation of their outputs, highlighting their weaknesses and strengths.
( 2
min )
We establish matching upper and lower generalization error bounds for
mini-batch Gradient Descent (GD) training with either deterministic or
stochastic, data-independent, but otherwise arbitrary batch selection rules. We
consider smooth Lipschitz-convex/nonconvex/strongly-convex loss functions, and
show that classical upper bounds for Stochastic GD (SGD) also hold verbatim for
such arbitrary nonadaptive batch schedules, including all deterministic ones.
Further, for convex and strongly-convex losses we prove matching lower bounds
directly on the generalization error uniform over the aforementioned class of
batch schedules, showing that all such batch schedules generalize optimally.
Lastly, for smooth (non-Lipschitz) nonconvex losses, we show that full-batch
(deterministic) GD is essentially optimal, among all possible batch schedules
within the considered class, including all stochastic ones.
( 2
min )
We consider a general $p$-norm objective for experimental design problems
that captures some well-studied objectives (D/A/E-design) as special cases. We
prove that a randomized local search approach provides a unified algorithm to
solve this problem for all $p$. This provides the first approximation algorithm
for the general $p$-norm objective, and a nice interpolation of the best known
bounds of the special cases.
( 2
min )
The idea of adversarial learning of regularization functionals has recently
been introduced in the wider context of inverse problems. The intuition behind
this method is the realization that it is not only necessary to learn the basic
features that make up a class of signals one wants to represent, but also, or
even more so, which features to avoid in the representation. In this paper, we
will apply this approach to the problem of source separation by means of
non-negative matrix factorization (NMF) and present a new method for the
adversarial training of NMF bases. We show in numerical experiments, both for
image and audio separation, that this leads to a clear improvement of the
reconstructed signals, in particular in the case where little or no strong
supervision data is available.
( 2
min )
Respiratory syncytial virus (RSV) is one of the most dangerous respiratory
diseases for infants and young children. Due to the nonpharmaceutical
intervention (NPI) imposed in the COVID-19 outbreak, the seasonal transmission
pattern of RSV has been discontinued in 2020 and then shifted months ahead in
2021 in the northern hemisphere. It is critical to understand how COVID-19
impacts RSV and build predictive algorithms to forecast the timing and
intensity of RSV reemergence in post-COVID-19 seasons. In this paper, we
propose a deep coupled tensor factorization machine, dubbed as DeCom, for post
COVID-19 RSV prediction. DeCom leverages tensor factorization and residual
modeling. It enables us to learn the disrupted RSV transmission reliably under
COVID-19 by taking both the regular seasonal RSV transmission pattern and the
NPI into consideration. Experimental results on a real RSV dataset show that
DeCom is more accurate than the state-of-the-art RSV prediction algorithms and
achieves up to 46% lower root mean square error and 49% lower mean absolute
error for country-level prediction compared to the baselines.
( 2
min )
A collaborative research team from the MIT-Takeda Program combined physics and machine learning to characterize rough particle surfaces in pharmaceutical pills and powders.
( 8
min )
Generative AI (GenAI) and large language models (LLMs), such as those available soon via Amazon Bedrock and Amazon Titan are transforming the way developers and enterprises are able to solve traditionally complex challenges related to natural language processing and understanding. Some of the benefits offered by LLMs include the ability to create more capable and […]
( 12
min )
Amazon SageMaker Studio is the first fully integrated development environment (IDE) for ML. It provides a single, web-based visual interface where you can perform all machine learning (ML) development steps required to build, train, tune, debug, deploy, and monitor models. It gives data scientists all the tools you need to take ML models from experimentation […]
( 13
min )
New generations of CPUs offer a significant performance improvement in machine learning (ML) inference due to specialized built-in instructions. Combined with their flexibility, high speed of development, and low operating cost, these general-purpose processors offer an alternative to other existing hardware solutions. AWS, Arm, Meta and others helped optimize the performance of PyTorch 2.0 inference […]
( 6
min )
This post is co-written by Jyoti Sharma and Sharmo Sarkar from Vericast. For any machine learning (ML) problem, the data scientist begins by working with data. This includes gathering, exploring, and understanding the business and technical aspects of the data, along with evaluation of any manipulations that may be needed for the model building process. […]
( 13
min )
Collaboration is key to bringing ideas from lab to life. In the first episode of the #MSRPodcast series “Collaborators,” learn how GitHub’s Kasia Sitkiewicz and Protocol Labs’ Petar Maymounkov are teaming up to make open-source collaborative work better.
The post Collaborators: Gov4git with Petar Maymounkov and Kasia Sitkiewicz appeared first on Microsoft Research.
( 31
min )
This work lists and describes the main recent strategies for building
fixed-length, dense and distributed representations for words, based on the
distributional hypothesis. These representations are now commonly called word
embeddings and, in addition to encoding surprisingly good syntactic and
semantic information, have been proven useful as extra features in many
downstream NLP tasks.
( 2
min )
Adversarial training, which is to enhance robustness against adversarial
attacks, has received much attention because it is easy to generate
human-imperceptible perturbations of data to deceive a given deep neural
network. In this paper, we propose a new adversarial training algorithm that is
theoretically well motivated and empirically superior to other existing
algorithms. A novel feature of the proposed algorithm is to apply more
regularization to data vulnerable to adversarial attacks than other existing
regularization algorithms do. Theoretically, we show that our algorithm can be
understood as an algorithm of minimizing the regularized empirical risk
motivated from a newly derived upper bound of the robust risk. Numerical
experiments illustrate that our proposed algorithm improves the generalization
(accuracy on examples) and robustness (accuracy on adversarial attacks)
simultaneously to achieve the state-of-the-art performance.
( 2
min )
We introduce Robust Exploration via Clustering-based Online Density
Estimation (RECODE), a non-parametric method for novelty-based exploration that
estimates visitation counts for clusters of states based on their similarity in
a chosen embedding space. By adapting classical clustering to the nonstationary
setting of Deep RL, RECODE can efficiently track state visitation counts over
thousands of episodes. We further propose a novel generalization of the inverse
dynamics loss, which leverages masked transformer architectures for multi-step
prediction; which in conjunction with RECODE achieves a new state-of-the-art in
a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new
state-of-the-art in hard exploration Atari games, and is the first agent to
reach the end screen in "Pitfall!".
( 2
min )
Gaussian copula mixture models (GCMM) are the generalization of Gaussian
Mixture models using the concept of copula. Its mathematical definition is
given and the properties of likelihood function are studied in this paper.
Based on these properties, extended Expectation Maximum algorithms are
developed for estimating parameters for the mixture of copulas while marginal
distributions corresponding to each component is estimated using separate
nonparametric statistical methods. In the experiment, GCMM can achieve better
goodness-of-fitting given the same number of clusters as GMM; furthermore, GCMM
can utilize unsynchronized data on each dimension to achieve deeper mining of
data.
( 2
min )
AV1, the next-generation video codec, is expanding its reach with today’s release of OBS Studio 29.1. This latest software update adds support for AV1 streaming to YouTube over Enhanced RTMP. All GeForce RTX 40 Series GPUs — including laptop GPUs and the recently launched GeForce RTX 4070 — support real-time AV1 hardware encoding, providing 40% Read article >
( 5
min )
NVIDIA today introduced a wave of cutting-edge AI research that will enable developers and artists to bring their ideas to life — whether still or moving, in 2D or 3D, hyperrealistic or fantastical. Around 20 NVIDIA Research papers advancing generative AI and neural graphics — including collaborations with over a dozen universities in the U.S., Read article >
( 8
min )
Content creator Grant Abbitt embodies selflessness, one of the best qualities that a creative can possess. Passionate about giving back to the creative community, Abbitt offers inspiration, guidance and free education for others in his field through YouTube tutorials.
( 7
min )
One of the most popular models available today is XGBoost. With the ability to solve various problems such as classification and regression, XGBoost has become a popular option that also falls into the category of tree-based models. In this post, we dive deep to see how Amazon SageMaker can serve these models using NVIDIA Triton […]
( 18
min )
Machine learning (ML) helps organizations generate revenue, reduce costs, mitigate risk, drive efficiencies, and improve quality by optimizing core business functions across multiple business units such as marketing, manufacturing, operations, sales, finance, and customer service. With AWS ML, organizations can accelerate the value creation from months to days. Amazon SageMaker Canvas is a visual, point-and-click […]
( 8
min )
Today, we announce the availability of sample notebooks that demonstrate question answering tasks using a Retrieval Augmented Generation (RAG)-based approach with large language models (LLMs) in Amazon SageMaker JumpStart. Text generation using RAG with LLMs enables you to generate domain-specific text outputs by supplying specific external data as part of the context fed to LLMs. […]
( 13
min )
Announcements Big tech must weigh AI’s risks vs. rewards In an interview with the New York Times, Hinton noted the pace of AI advancement is far beyond what he and other tech experts predicted. Hinton said that Google acted very responsibly while he worked on its AI development efforts. His concerns are due to AI’s… Read More »DSC Weekly 2 May 2023 – Big tech must weigh AI’s risks vs. rewards
The post DSC Weekly 2 May 2023 – Big tech must weigh AI’s risks vs. rewards appeared first on Data Science Central.
( 19
min )
Discover the differences between AI, machine learning, and deep learning in this comprehensive guide. Learn how each technology works, their key applications, and the skills required for a career in data science.
The post AI vs Machine Learning vs Deep Learning appeared first on Data Science Central.
( 23
min )
The rapid adoption of smart phones and other mobile platforms has generated an enormous amount of image data. According to Gartner, unstructured data now represents 80–90% of all new enterprise data, but just 18% of organizations are taking advantage of this data. This is mainly due to a lack of expertise and the large amount […]
( 9
min )
Irene Politkoff, Founder and Chief Product Evangelist at semantic modeling tools provider TopQuadrant, posted this description of the large language model (LLM) ChatGPT: “ChatGPT doesn’t access a database of facts to answer your questions. Instead, its responses are based on patterns that it saw in the training data. So ChatGPT is not always trustworthy.” Georgetown… Read More »Can we boost the confidence scores of LLM answers with the help of knowledge graphs?
The post Can we boost the confidence scores of LLM answers with the help of knowledge graphs? appeared first on Data Science Central.
( 20
min )
Customers from Japan to Ecuador and Sweden are using NVIDIA DGX H100 systems like AI factories to manufacture intelligence. They’re creating services that offer AI-driven insights in finance, healthcare, law, IT and telecom — and working to transform their industries in the process. Among the dozens of use cases, one aims to predict how factory Read article >
( 6
min )
Textual backdoor attacks pose a practical threat to existing systems, as they
can compromise the model by inserting imperceptible triggers into inputs and
manipulating labels in the training dataset. With cutting-edge generative
models such as GPT-4 pushing rewriting to extraordinary levels, such attacks
are becoming even harder to detect. We conduct a comprehensive investigation of
the role of black-box generative models as a backdoor attack tool, highlighting
the importance of researching relative defense strategies. In this paper, we
reveal that the proposed generative model-based attack, BGMAttack, could
effectively deceive textual classifiers. Compared with the traditional attack
methods, BGMAttack makes the backdoor trigger less conspicuous by leveraging
state-of-the-art generative models. Our extensive evaluation of attack
effectiveness across five datasets, complemented by three distinct human
cognition assessments, reveals that Figure 4 achieves comparable attack
performance while maintaining superior stealthiness relative to baseline
methods.
( 2
min )
We present a new approach, the Topograph, which reconstructs underlying
physics processes, including the intermediary particles, by leveraging
underlying priors from the nature of particle physics decays and the
flexibility of message passing graph neural networks. The Topograph not only
solves the combinatoric assignment of observed final state objects, associating
them to their original mother particles, but directly predicts the properties
of intermediate particles in hard scatter processes and their subsequent
decays. In comparison to standard combinatoric approaches or modern approaches
using graph neural networks, which scale exponentially or quadratically, the
complexity of Topographs scales linearly with the number of reconstructed
objects.
We apply Topographs to top quark pair production in the all hadronic decay
channel, where we outperform the standard approach and match the performance of
the state-of-the-art machine learning technique.
( 2
min )
With recent advancements in computer vision as well as machine learning (ML),
video-based at-home exercise evaluation systems have become a popular topic of
current research. However, performance depends heavily on the amount of
available training data. Since labeled datasets specific to exercising are
rare, we propose a method that makes use of the abundance of fitness videos
available online. Specifically, we utilize the advantage that videos often not
only show the exercises, but also provide language as an additional source of
information. With push-ups as an example, we show that through the analysis of
subtitle data using natural language processing (NLP), it is possible to create
a labeled (irrelevant, relevant correct, relevant incorrect) dataset containing
relevant information for pose analysis. In particular, we show that irrelevant
clips ($n=332$) have significantly different joint visibility values compared
to relevant clips ($n=298$). Inspecting cluster centroids also show different
poses for the different classes.
( 2
min )
The recent advent of play-to-earn (P2E) systems in massively multiplayer
online role-playing games (MMORPGs) has made in-game goods interchangeable with
real-world values more than ever before. The goods in the P2E MMORPGs can be
directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn
via blockchain networks. Unlike traditional in-game goods, once they had been
written to the blockchains, P2E goods cannot be restored by the game operation
teams even with chargeback fraud such as payment fraud, cancellation, or
refund. To tackle the problem, we propose a novel chargeback fraud prediction
method, PU GNN, which leverages graph attention networks with PU loss to
capture both the players' in-game behavior with P2E token transaction patterns.
With the adoption of modified GraphSMOTE, the proposed model handles the
imbalanced distribution of labels in chargeback fraud datasets. The conducted
experiments on three real-world P2E MMORPG datasets demonstrate that PU GNN
achieves superior performances over previously suggested methods.
( 2
min )
Recent work has shown that simple linear models can outperform several
Transformer based approaches in long term time-series forecasting. Motivated by
this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model,
Time-series Dense Encoder (TiDE), for long-term time-series forecasting that
enjoys the simplicity and speed of linear models while also being able to
handle covariates and non-linear dependencies. Theoretically, we prove that the
simplest linear analogue of our model can achieve near optimal error rate for
linear dynamical systems (LDS) under some assumptions. Empirically, we show
that our method can match or outperform prior approaches on popular long-term
time-series forecasting benchmarks while being 5-10x faster than the best
Transformer based model.
( 2
min )
Computer vision methods have shown to be effective in classifying garbage
into recycling categories for waste processing, existing methods are costly,
imprecise, and unclear. To tackle this issue, we introduce MWaste, a mobile
application that uses computer vision and deep learning techniques to classify
waste materials as trash, plastic, paper, metal, glass or cardboard. Its
effectiveness was tested on various neural network architectures and real-world
images, achieving an average precision of 92\% on the test set. This app can
help combat climate change by enabling efficient waste processing and reducing
the generation of greenhouse gases caused by incorrect waste disposal.
( 2
min )
Detecting an abrupt distributional shift of the data stream, known as
change-point detection, is a fundamental problem in statistics and signal
processing. We present a new approach for online change-point detection by
training neural networks (NN), and sequentially cumulating the detection
statistics by evaluating the trained discriminating function on test samples by
a CUSUM recursion. The idea is based on the observation that training neural
networks through logistic loss may lead to the log-likelihood function. We
demonstrated the good performance of NN-CUSUM in the detection of
high-dimensional data using both synthetic and real-world data.
( 2
min )
Feedback from active galactic nuclei (AGN) and supernovae can affect
measurements of integrated SZ flux of halos ($Y_\mathrm{SZ}$) from CMB surveys,
and cause its relation with the halo mass ($Y_\mathrm{SZ}-M$) to deviate from
the self-similar power-law prediction of the virial theorem. We perform a
comprehensive study of such deviations using CAMELS, a suite of hydrodynamic
simulations with extensive variations in feedback prescriptions. We use a
combination of two machine learning tools (random forest and symbolic
regression) to search for analogues of the $Y-M$ relation which are more robust
to feedback processes for low masses ($M\lesssim 10^{14}\, h^{-1} \, M_\odot$);
we find that simply replacing $Y\rightarrow Y(1+M_*/M_\mathrm{gas})$ in the
relation makes it remarkably self-similar. This could serve as a robust
multiwavelength mass proxy for low-mass clusters and galaxy groups. Our
methodology can also be generally useful to improve the domain of validity of
other astrophysical scaling relations.
We also forecast that measurements of the $Y-M$ relation could provide
percent-level constraints on certain combinations of feedback parameters and/or
rule out a major part of the parameter space of supernova and AGN feedback
models used in current state-of-the-art hydrodynamic simulations. Our results
can be useful for using upcoming SZ surveys (e.g., SO, CMB-S4) and galaxy
surveys (e.g., DESI and Rubin) to constrain the nature of baryonic feedback.
Finally, we find that the an alternative relation, $Y-M_*$, provides
complementary information on feedback than $Y-M$
( 3
min )
The Hopfield model is a paradigmatic model of neural networks that has been
analyzed for many decades in the statistical physics, neuroscience, and machine
learning communities. Inspired by the manifold hypothesis in machine learning,
we propose and investigate a generalization of the standard setting that we
name Random-Features Hopfield Model. Here $P$ binary patterns of length $N$ are
generated by applying to Gaussian vectors sampled in a latent space of
dimension $D$ a random projection followed by a non-linearity. Using the
replica method from statistical physics, we derive the phase diagram of the
model in the limit $P,N,D\to\infty$ with fixed ratios $\alpha=P/N$ and
$\alpha_D=D/N$. Besides the usual retrieval phase, where the patterns can be
dynamically recovered from some initial corruption, we uncover a new phase
where the features characterizing the projection can be recovered instead. We
call this phenomena the learning phase transition, as the features are not
explicitly given to the model but rather are inferred from the patterns in an
unsupervised fashion.
( 2
min )
Recently, quantum classifiers have been found to be vulnerable to adversarial
attacks, in which quantum classifiers are deceived by imperceptible noises,
leading to misclassification. In this paper, we propose the first theoretical
study demonstrating that adding quantum random rotation noise can improve
robustness in quantum classifiers against adversarial attacks. We link the
definition of differential privacy and show that the quantum classifier trained
with the natural presence of additive noise is differentially private. Finally,
we derive a certified robustness bound to enable quantum classifiers to defend
against adversarial examples, supported by experimental results simulated with
noises from IBM's 7-qubits device.
( 2
min )
Semantic segmentation models classifying hyperspectral images (HSI) are
vulnerable to adversarial examples. Traditional approaches to adversarial
robustness focus on training or retraining a single network on attacked data,
however, in the presence of multiple attacks these approaches decrease in
performance compared to networks trained individually on each attack. To combat
this issue we propose an Adversarial Discriminator Ensemble Network (ADE-Net)
which focuses on attack type detection and adversarial robustness under a
unified model to preserve per data-type weight optimally while robustifiying
the overall network. In the proposed method, a discriminator network is used to
separate data by attack type into their specific attack-expert ensemble
network.
( 2
min )
We study reward poisoning attacks on online deep reinforcement learning
(DRL), where the attacker is oblivious to the learning algorithm used by the
agent and the dynamics of the environment. We demonstrate the intrinsic
vulnerability of state-of-the-art DRL algorithms by designing a general,
black-box reward poisoning framework called adversarial MDP attacks. We
instantiate our framework to construct two new attacks which only corrupt the
rewards for a small fraction of the total training timesteps and make the agent
learn a low-performing policy. We provide a theoretical analysis of the
efficiency of our attack and perform an extensive empirical evaluation. Our
results show that our attacks efficiently poison agents learning in several
popular classical control and MuJoCo environments with a variety of
state-of-the-art DRL algorithms, such as DQN, PPO, SAC, etc.
( 2
min )
Voxel-based 3D object classification has been thoroughly studied in recent
years. Most previous methods convert the classic 2D convolution into a 3D form
that will be further applied to objects with binary voxel representation for
classification. However, the binary voxel representation is not very effective
for 3D convolution in many cases. In this paper, we propose a hybrid cascade
architecture for voxel-based 3D object classification. It consists of three
stages composed of fully connected and convolutional layers, dealing with easy,
moderate, and hard 3D models respectively. Both accuracy and speed can be
balanced in our proposed method. By giving each voxel a signed distance value,
an obvious gain regarding the accuracy can be observed. Besides, the mean
inference time can be speeded up hugely compared with the state-of-the-art
point cloud and voxel based methods.
( 2
min )
SplitFed Learning, a combination of Federated and Split Learning (FL and SL),
is one of the most recent developments in the decentralized machine learning
domain. In SplitFed learning, a model is trained by clients and a server
collaboratively. For image segmentation, labels are created at each client
independently and, therefore, are subject to clients' bias, inaccuracies, and
inconsistencies. In this paper, we propose a data quality-based adaptive
averaging strategy for SplitFed learning, called QA-SplitFed, to cope with the
variation of annotated ground truth (GT) quality over multiple clients. The
proposed method is compared against five state-of-the-art model averaging
methods on the task of learning human embryo image segmentation. Our
experiments show that all five baseline methods fail to maintain accuracy as
the number of corrupted clients increases. QA-SplitFed, however, copes
effectively with corruption as long as there is at least one uncorrupted
client.
( 2
min )
A stochastic-gradient-based interior-point algorithm for minimizing a
continuously differentiable objective function (that may be nonconvex) subject
to bound constraints is presented, analyzed, and demonstrated through
experimental results. The algorithm is unique from other interior-point methods
for solving smooth (nonconvex) optimization problems since the search
directions are computed using stochastic gradient estimates. It is also unique
in its use of inner neighborhoods of the feasible region -- defined by a
positive and vanishing neighborhood-parameter sequence -- in which the iterates
are forced to remain. It is shown that with a careful balance between the
barrier, step-size, and neighborhood sequences, the proposed algorithm
satisfies convergence guarantees in both deterministic and stochastic settings.
The results of numerical experiments show that in both settings the algorithm
can outperform a projected-(stochastic)-gradient method.
( 2
min )
Privacy-preserving machine learning solutions have recently gained
significant attention. One promising research trend is using Homomorphic
Encryption (HE), a method for performing computation over encrypted data. One
major challenge in this approach is training HE-friendly, encrypted or
unencrypted, deep CNNs with decent accuracy. We propose a novel training method
for HE-friendly models, and demonstrate it on fundamental and modern CNNs, such
as ResNet and ConvNeXt. After training, we evaluate our models by running
encrypted samples using HELayers SDK and proving that they yield the desired
results. When running on a GPU over the ImageNet dataset, our ResNet-18/50/101
implementations take only 7, 31 and 57 minutes, respectively, which shows that
this solution is practical. Furthermore, we present several insights on
handling the activation functions and skip-connections under HE. Finally, we
demonstrate in an unprecedented way how to perform secure zero-shot prediction
using a CLIP model that we adapted to be HE-friendly.
( 2
min )
The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two
different problems for the first time in a research competition under
well-defined conditions: In the Emotion Share Sub-Challenge, a regression on
speech has to be made; and in the Requests Sub-Challenges, requests and
complaints need to be detected. We describe the Sub-Challenges, baseline
feature extraction, and classifiers based on the usual ComPaRE features, the
auDeep toolkit, and deep feature extraction from pre-trained CNNs using the
DeepSpectRum toolkit; in addition, wav2vec2 models are used.
( 2
min )
Patient-independent detection of epileptic activities based on visual
spectral representation of continuous EEG (cEEG) has been widely used for
diagnosing epilepsy. However, precise detection remains a considerable
challenge due to subtle variabilities across subjects, channels and time
points. Thus, capturing fine-grained, discriminative features of EEG patterns,
which is associated with high-frequency textural information, is yet to be
resolved. In this work, we propose Scattering Transformer (ScatterFormer), an
invariant scattering transform-based hierarchical Transformer that specifically
pays attention to subtle features. In particular, the disentangled
frequency-aware attention (FAA) enables the Transformer to capture clinically
informative high-frequency components, offering a novel clinical explainability
based on visual encoding of multichannel EEG signals. Evaluations on two
distinct tasks of epileptiform detection demonstrate the effectiveness our
method. Our proposed model achieves median AUCROC and accuracy of 98.14%,
96.39% in patients with Rolandic epilepsy. On a neonatal seizure detection
benchmark, it outperforms the state-of-the-art by 9% in terms of average
AUCROC.
( 2
min )
This paper targets the perceptual task of separating the different
interacting voices, i.e., monophonic melodic streams, in a polyphonic musical
piece. We target symbolic music, where notes are explicitly encoded, and model
this task as a Multi-Trajectory Tracking (MTT) problem from discrete
observations, i.e., notes in a pitch-time space. Our approach builds a graph
from a musical piece, by creating one node for every note, and separates the
melodic trajectories by predicting a link between two notes if they are
consecutive in the same voice/stream. This kind of local, greedy prediction is
made possible by node embeddings created by a heterogeneous graph neural
network that can capture inter- and intra-trajectory information. Furthermore,
we propose a new regularization loss that encourages the output to respect the
MTT premise of at most one incoming and one outgoing link for every node,
favouring monophonic (voice) trajectories; this loss function might also be
useful in other general MTT scenarios. Our approach does not use
domain-specific heuristics, is scalable to longer sequences and a higher number
of voices, and can handle complex cases such as voice inversions and overlaps.
We reach new state-of-the-art results for the voice separation task in
classical music of different styles.
( 2
min )
Graph Neural Networks (GNNs) are a form of deep learning that enable a wide
range of machine learning applications on graph-structured data. The learning
of GNNs, however, is known to pose challenges for memory-constrained devices
such as GPUs. In this paper, we study exact compression as a way to reduce the
memory requirements of learning GNNs on large graphs. In particular, we adopt a
formal approach to compression and propose a methodology that transforms GNN
learning problems into provably equivalent compressed GNN learning problems. In
a preliminary experimental evaluation, we give insights into the compression
ratios that can be obtained on real-world graphs and apply our methodology to
an existing GNN benchmark.
( 2
min )
Kernelized Stein discrepancy (KSD) is a score-based discrepancy widely used
in goodness-of-fit tests. It can be applied even when the target distribution
has an unknown normalising factor, such as in Bayesian analysis. We show
theoretically and empirically that the KSD test can suffer from low power when
the target and the alternative distribution have the same well-separated modes
but differ in mixing proportions. We propose to perturb the observed sample via
Markov transition kernels, with respect to which the target distribution is
invariant. This allows us to then employ the KSD test on the perturbed sample.
We provide numerical evidence that with suitably chosen kernels the proposed
approach can lead to a substantially higher power than the KSD test.
( 2
min )
Experimental data is often comprised of variables measured independently, at
different sampling rates (non-uniform ${\Delta}$t between successive
measurements); and at a specific time point only a subset of all variables may
be sampled. Approaches to identifying dynamical systems from such data
typically use interpolation, imputation or subsampling to reorganize or modify
the training data $\textit{prior}$ to learning. Partial physical knowledge may
also be available $\textit{a priori}$ (accurately or approximately), and
data-driven techniques can complement this knowledge. Here we exploit neural
network architectures based on numerical integration methods and $\textit{a
priori}$ physical knowledge to identify the right-hand side of the underlying
governing differential equations. Iterates of such neural-network models allow
for learning from data sampled at arbitrary time points $\textit{without}$ data
modification. Importantly, we integrate the network with available partial
physical knowledge in "physics informed gray-boxes"; this enables learning
unknown kinetic rates or microbial growth functions while simultaneously
estimating experimental parameters.
( 2
min )
Recent work has shown that simple linear models can outperform several
Transformer based approaches in long term time-series forecasting. Motivated by
this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model,
Time-series Dense Encoder (TiDE), for long-term time-series forecasting that
enjoys the simplicity and speed of linear models while also being able to
handle covariates and non-linear dependencies. Theoretically, we prove that the
simplest linear analogue of our model can achieve near optimal error rate for
linear dynamical systems (LDS) under some assumptions. Empirically, we show
that our method can match or outperform prior approaches on popular long-term
time-series forecasting benchmarks while being 5-10x faster than the best
Transformer based model.
( 2
min )
Kernelized Stein discrepancy (KSD) is a score-based discrepancy widely used
in goodness-of-fit tests. It can be applied even when the target distribution
has an unknown normalising factor, such as in Bayesian analysis. We show
theoretically and empirically that the KSD test can suffer from low power when
the target and the alternative distribution have the same well-separated modes
but differ in mixing proportions. We propose to perturb the observed sample via
Markov transition kernels, with respect to which the target distribution is
invariant. This allows us to then employ the KSD test on the perturbed sample.
We provide numerical evidence that with suitably chosen kernels the proposed
approach can lead to a substantially higher power than the KSD test.
( 2
min )
We provide an exact expressions for the 1-Wasserstein distance between
independent location-scale distributions. The expressions are represented using
location and scale parameters and special functions such as the standard
Gaussian CDF or the Gamma function. Specifically, we find that the
1-Wasserstein distance between independent univariate location-scale
distributions is equivalent to the mean of a folded distribution within the
same family whose underlying location and scale are equal to the difference of
the locations and scales of the original distributions. A new linear upper
bound on the 1-Wasserstein distance is presented and the asymptotic bounds of
the 1-Wasserstein distance are detailed in the Gaussian case. The effect of
differential privacy using the Laplace and Gaussian mechanisms on the
1-Wasserstein distance is studied using the closed-form expressions and bounds.
( 2
min )
Sparse principal component analysis (SPCA) is widely used for dimensionality
reduction and feature extraction in high-dimensional data analysis. Despite
many methodological and theoretical developments in the past two decades, the
theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie &
Tibshirani (2006) are still unknown. This paper aims to address this critical
gap. We first revisit the SPCA algorithm of Zou et al. (2006) and present our
implementation. We also study a computationally more efficient variant of the
SPCA algorithm in Zou et al. (2006) that can be considered as the limiting case
of SPCA. We provide the guarantees of convergence to a stationary point for
both algorithms and prove that, under a sparse spiked covariance model, both
algorithms can recover the principal subspace consistently under mild
regularity conditions. We show that their estimation error bounds match the
best available bounds of existing works or the minimax rates up to some
logarithmic factors. Moreover, we demonstrate the competitive numerical
performance of both algorithms in numerical studies.
( 2
min )
Message Passing Neural Networks (MPNNs) are instances of Graph Neural
Networks that leverage the graph to send messages over the edges. This
inductive bias leads to a phenomenon known as over-squashing, where a node
feature is insensitive to information contained at distant nodes. Despite
recent methods introduced to mitigate this issue, an understanding of the
causes for over-squashing and of possible solutions are lacking. In this
theoretical work, we prove that: (i) Neural network width can mitigate
over-squashing, but at the cost of making the whole network more sensitive;
(ii) Conversely, depth cannot help mitigate over-squashing: increasing the
number of layers leads to over-squashing being dominated by vanishing
gradients; (iii) The graph topology plays the greatest role, since
over-squashing occurs between nodes at high commute (access) time. Our analysis
provides a unified framework to study different recent methods introduced to
cope with over-squashing and serves as a justification for a class of methods
that fall under `graph rewiring'.
( 2
min )
Recovering the latent factors of variation of high dimensional data has so
far focused on simple synthetic settings. Mostly building on unsupervised and
weakly-supervised objectives, prior work missed out on the positive
implications for representation learning on real world data. In this work, we
propose to leverage knowledge extracted from a diversified set of supervised
tasks to learn a common disentangled representation. Assuming each supervised
task only depends on an unknown subset of the factors of variation, we
disentangle the feature space of a supervised multi-task model, with features
activating sparsely across different tasks and information being shared as
appropriate. Importantly, we never directly observe the factors of variations
but establish that access to multiple tasks is sufficient for identifiability
under sufficiency and minimality assumptions. We validate our approach on six
real world distribution shift benchmarks, and different data modalities
(images, text), demonstrating how disentangled representations can be
transferred to real settings.
( 2
min )
We study variance-dependent regret bounds for Markov decision processes
(MDPs). Algorithms with variance-dependent regret guarantees can automatically
exploit environments with low variance (e.g., enjoying constant regret on
deterministic MDPs). The existing algorithms are either variance-independent or
suboptimal. We first propose two new environment norms to characterize the
fine-grained variance properties of the environment. For model-based methods,
we design a variant of the MVP algorithm (Zhang et al., 2021a) and use new
analysis techniques show to this algorithm enjoys variance-dependent bounds
with respect to our proposed norms. In particular, this bound is simultaneously
minimax optimal for both stochastic and deterministic MDPs, the first result of
its kind. We further initiate the study on model-free algorithms with
variance-dependent regret bounds by designing a reference-function-based
algorithm with a novel capped-doubling reference update schedule. Lastly, we
also provide lower bounds to complement our upper bounds.
( 2
min )
A spiking neural network (SNN) equalizer with a decision feedback structure
is applied to an IM/DD link with various parameters. The SNN outperforms linear
and artificial neural network (ANN) based equalizers.
( 2
min )
The goal of this paper is to learn more about how idiomatic information is
structurally encoded in embeddings, using a structural probing method. We
repurpose an existing English verbal multi-word expression (MWE) dataset to
suit the probing framework and perform a comparative probing study of static
(GloVe) and contextual (BERT) embeddings. Our experiments indicate that both
encode some idiomatic information to varying degrees, but yield conflicting
evidence as to whether idiomaticity is encoded in the vector norm, leaving this
an open question. We also identify some limitations of the used dataset and
highlight important directions for future work in improving its suitability for
a probing analysis.
( 2
min )
Experimental data is often comprised of variables measured independently, at
different sampling rates (non-uniform ${\Delta}$t between successive
measurements); and at a specific time point only a subset of all variables may
be sampled. Approaches to identifying dynamical systems from such data
typically use interpolation, imputation or subsampling to reorganize or modify
the training data $\textit{prior}$ to learning. Partial physical knowledge may
also be available $\textit{a priori}$ (accurately or approximately), and
data-driven techniques can complement this knowledge. Here we exploit neural
network architectures based on numerical integration methods and $\textit{a
priori}$ physical knowledge to identify the right-hand side of the underlying
governing differential equations. Iterates of such neural-network models allow
for learning from data sampled at arbitrary time points $\textit{without}$ data
modification. Importantly, we integrate the network with available partial
physical knowledge in "physics informed gray-boxes"; this enables learning
unknown kinetic rates or microbial growth functions while simultaneously
estimating experimental parameters.
( 2
min )
Annealed Importance Sampling (AIS) moves particles along a Markov chain from
a tractable initial distribution to an intractable target distribution. The
recently proposed Differentiable AIS (DAIS) (Geffner and Domke, 2021; Zhang et
al., 2021) enables efficient optimization of the transition kernels of AIS and
of the distributions. However, we observe a low effective sample size in DAIS,
indicating degenerate distributions. We thus propose to extend DAIS by a
resampling step inspired by Sequential Monte Carlo. Surprisingly, we find
empirically-and can explain theoretically-that it is not necessary to
differentiate through the resampling step which avoids gradient variance issues
observed in similar approaches for Particle Filters (Maddison et al., 2017;
Naesseth et al., 2018; Le et al., 2018).
( 2
min )
Organizations are increasingly adopting machine learning (ML) for personnel
assessment. However, concerns exist about fairness in designing and
implementing ML assessments. Supervised ML models are trained to model patterns
in data, meaning ML models tend to yield predictions that reflect subgroup
differences in applicant attributes in the training data, regardless of the
underlying cause of subgroup differences. In this study, we systematically
under- and oversampled minority (Black and Hispanic) applicants to manipulate
adverse impact ratios in training data and investigated how training data
adverse impact ratios affect ML model adverse impact and accuracy. We used
self-reports and interview transcripts from job applicants (N = 2,501) to train
9,702 ML models to predict screening decisions. We found that training data
adverse impact related linearly to ML model adverse impact. However, removing
adverse impact from training data only slightly reduced ML model adverse impact
and tended to negatively affect ML model accuracy. We observed consistent
effects across self-reports and interview transcripts, whether oversampling
real (i.e., bootstrapping) or synthetic observations. As our study relied on
limited predictor sets from one organization, the observed effects on adverse
impact may be attenuated among more accurate ML models.
( 2
min )
This tutorial survey provides an overview of recent non-asymptotic advances
in statistical learning theory as relevant to control and system
identification. While there has been substantial progress across all areas of
control, the theory is most well-developed when it comes to linear system
identification and learning for the linear quadratic regulator, which are the
focus of this manuscript. From a theoretical perspective, much of the labor
underlying these advances has been in adapting tools from modern
high-dimensional statistics and learning theory. While highly relevant to
control theorists interested in integrating tools from machine learning, the
foundational material has not always been easily accessible. To remedy this, we
provide a self-contained presentation of the relevant material, outlining all
the key ideas and the technical machinery that underpin recent results. We also
present a number of open problems and future directions.
( 2
min )
Conditional Average Treatment Effects (CATE) estimation is one of the main
challenges in causal inference with observational data. In addition to Machine
Learning based-models, nonparametric estimators called meta-learners have been
developed to estimate the CATE with the main advantage of not restraining the
estimation to a specific supervised learning method. This task becomes,
however, more complicated when the treatment is not binary as some limitations
of the naive extensions emerge. This paper looks into meta-learners for
estimating the heterogeneous effects of multi-valued treatments. We consider
different meta-learners, and we carry out a theoretical analysis of their error
upper bounds as functions of important parameters such as the number of
treatment levels, showing that the naive extensions do not always provide
satisfactory results. We introduce and discuss meta-learners that perform well
as the number of treatments increases. We empirically confirm the strengths and
weaknesses of those methods with synthetic and semi-synthetic datasets.
( 2
min )
A crucial challenge in reinforcement learning is to reduce the number of
interactions with the environment that an agent requires to master a given
task. Transfer learning proposes to address this issue by re-using knowledge
from previously learned tasks. However, determining which source task qualifies
as the most appropriate for knowledge extraction, as well as the choice
regarding which algorithm components to transfer, represent severe obstacles to
its application in reinforcement learning. The goal of this paper is to address
these issues with modular multi-source transfer learning techniques. The
proposed techniques automatically learn how to extract useful information from
source tasks, regardless of the difference in state-action space and reward
function. We support our claims with extensive and challenging cross-domain
experiments for visual control.
( 2
min )
We analyze the generalization ability of joint-training meta learning
algorithms via the Gibbs algorithm. Our exact characterization of the expected
meta generalization error for the meta Gibbs algorithm is based on symmetrized
KL information, which measures the dependence between all meta-training
datasets and the output parameters, including task-specific and meta
parameters. Additionally, we derive an exact characterization of the meta
generalization error for the super-task Gibbs algorithm, in terms of
conditional symmetrized KL information within the super-sample and super-task
framework introduced in Steinke and Zakynthinou (2020) and Hellstrom and Durisi
(2022) respectively. Our results also enable us to provide novel
distribution-free generalization error upper bounds for these Gibbs algorithms
applicable to meta learning.
( 2
min )
Many techniques in machine learning attempt explicitly or implicitly to infer
a low-dimensional manifold structure of an underlying physical phenomenon from
measurements without an explicit model of the phenomenon or the measurement
apparatus. This paper presents a cautionary tale regarding the discrepancy
between the geometry of measurements and the geometry of the underlying
phenomenon in a benign setting. The deformation in the metric illustrated in
this paper is mathematically straightforward and unavoidable in the general
case, and it is only one of several similar effects. While this is not always
problematic, we provide an example of an arguably standard and harmless data
processing procedure where this effect leads to an incorrect answer to a
seemingly simple question. Although we focus on manifold learning, these issues
apply broadly to dimensionality reduction and unsupervised learning.
( 2
min )
The learnable, linear neural network layers between tensor power spaces of
$\mathbb{R}^{n}$ that are equivariant to the orthogonal group, $O(n)$, the
special orthogonal group, $SO(n)$, and the symplectic group, $Sp(n)$, were
characterised in arXiv:2212.08630. We present an algorithm for multiplying a
vector by any weight matrix for each of these groups, using category theoretic
constructions to implement the procedure. We achieve a significant reduction in
computational cost compared with a naive implementation by making use of
Kronecker product matrices to perform the multiplication. We show that our
approach extends to the symmetric group, $S_n$, recovering the algorithm of
arXiv:2303.06208 in the process.
( 2
min )
Neural network model compression techniques can address the computation issue
of deep neural networks on embedded devices in industrial systems. The
guaranteed output error computation problem for neural network compression with
quantization is addressed in this paper. A merged neural network is built from
a feedforward neural network and its quantized version to produce the exact
output difference between two neural networks. Then, optimization-based methods
and reachability analysis methods are applied to the merged neural network to
compute the guaranteed quantization error. Finally, a numerical example is
proposed to validate the applicability and effectiveness of the proposed
approach.
( 2
min )
We introduce a new computational framework for estimating parameters in
generalized generalized linear models (GGLM), a class of models that extends
the popular generalized linear models (GLM) to account for dependencies among
observations in spatio-temporal data. The proposed approach uses a monotone
operator-based variational inequality method to overcome non-convexity in
parameter estimation and provide guarantees for parameter recovery. The results
can be applied to GLM and GGLM, focusing on spatio-temporal models. We also
present online instance-based bounds using martingale concentrations
inequalities. Finally, we demonstrate the performance of the algorithm using
numerical simulations and a real data example for wildfire incidents.
( 2
min )
Gradient-boosted decision trees (GBDT) are widely used and highly effective
machine learning approach for tabular data modeling. However, their complex
structure may lead to low robustness against small covariate perturbation in
unseen data. In this study, we apply one-hot encoding to convert a GBDT model
into a linear framework, through encoding of each tree leaf to one dummy
variable. This allows for the use of linear regression techniques, plus a novel
risk decomposition for assessing the robustness of a GBDT model against
covariate perturbations. We propose to enhance the robustness of GBDT models by
refitting their linear regression forms with $L_1$ or $L_2$ regularization.
Theoretical results are obtained about the effect of regularization on the
model performance and robustness. It is demonstrated through numerical
experiments that the proposed regularization approach can enhance the
robustness of the one-hot-encoded GBDT models.
( 2
min )
In this paper, a computationally efficient data-driven hybrid automaton model
is proposed to capture unknown complex dynamical system behaviors using
multiple neural networks. The sampled data of the system is divided by valid
partitions into groups corresponding to their topologies and based on which,
transition guards are defined. Then, a collection of small-scale neural
networks that are computationally efficient are trained as the local dynamical
description for their corresponding topologies. After modeling the system with
a neural-network-based hybrid automaton, the set-valued reachability analysis
with low computation cost is provided based on interval analysis and a split
and combined process. At last, a numerical example of the limit cycle is
presented to illustrate that the developed models can significantly reduce the
computational cost in reachable set computation without sacrificing any
modeling precision.
( 2
min )
Federated learning (FL) is an emerging technique that trains massive and
geographically distributed edge data while maintaining privacy. However, FL has
inherent challenges in terms of fairness and computational efficiency due to
the rising heterogeneity of edges, and thus usually results in sub-optimal
performance in recent state-of-the-art (SOTA) solutions. In this paper, we
propose a Customized Federated Learning (CFL) system to eliminate FL
heterogeneity from multiple dimensions. Specifically, CFL tailors personalized
models from the specially designed global model for each client jointly guided
by an online trained model-search helper and a novel aggregation algorithm.
Extensive experiments demonstrate that CFL has full-stack advantages for both
FL training and edge reasoning and significantly improves the SOTA performance
w.r.t. model accuracy (up to 7.2% in the non-heterogeneous environment and up
to 21.8% in the heterogeneous environment), efficiency, and FL fairness.
( 2
min )
Semantic knowledge of part-part and part-whole relationships in assemblies is
useful for a variety of tasks from searching design repositories to the
construction of engineering knowledge bases. In this work we propose that the
natural language names designers use in Computer Aided Design (CAD) software
are a valuable source of such knowledge, and that Large Language Models (LLMs)
contain useful domain-specific information for working with this data as well
as other CAD and engineering-related tasks.
In particular we extract and clean a large corpus of natural language part,
feature and document names and use this to quantitatively demonstrate that a
pre-trained language model can outperform numerous benchmarks on three
self-supervised tasks, without ever having seen this data before. Moreover, we
show that fine-tuning on the text data corpus further boosts the performance on
all tasks, thus demonstrating the value of the text data which until now has
been largely ignored. We also identify key limitations to using LLMs with text
data alone, and our findings provide a strong motivation for further work into
multi-modal text-geometry models.
To aid and encourage further work in this area we make all our data and code
publicly available.
( 2
min )
In this paper we present the first version of ganX -- generate artificially
new XRF, a Python library to generate X-ray fluorescence Macro maps (MA-XRF)
from a coloured RGB image. To do that, a Monte Carlo method is used, where each
MA-XRF pixel signal is sampled out of an XRF signal probability function. Such
probability function is computed using a database of couples (pigment
characteristic XRF signal, RGB), by a weighted sum of such pigment XRF signal
by proximity of the image RGB to the pigment characteristic RGB. The library is
released to PyPi and the code is available open source on GitHub.
( 2
min )
Distributed learning paradigms, such as federated or decentralized learning,
allow a collection of agents to solve global learning and optimization problems
through limited local interactions. Most such strategies rely on a mixture of
local adaptation and aggregation steps, either among peers or at a central
fusion center. Classically, aggregation in distributed learning is based on
averaging, which is statistically efficient, but susceptible to attacks by even
a small number of malicious agents. This observation has motivated a number of
recent works, which develop robust aggregation schemes by employing robust
variations of the mean. We present a new attack based on sensitivity curve
maximization (SCM), and demonstrate that it is able to disrupt existing robust
aggregation schemes by injecting small, but effective perturbations.
( 2
min )
This paper provides answers to an open problem: given a nonlinear data-driven
dynamical system model, e.g., kernel conditional mean embedding (CME) and
Koopman operator, how can one propagate the ambiguity sets forward for multiple
steps? This problem is the key to solving distributionally robust control and
learning-based control of such learned system models under a data-distribution
shift. Different from previous works that use either static ambiguity sets,
e.g., fixed Wasserstein balls, or dynamic ambiguity sets under known piece-wise
linear (or affine) dynamics, we propose an algorithm that exactly propagates
ambiguity sets through nonlinear data-driven models using the Koopman operator
and CME, via the kernel maximum mean discrepancy geometry. Through both
theoretical and numerical analysis, we show that our kernel ambiguity sets are
the natural geometric structure for the learned data-driven dynamical system
models.
( 2
min )
The success of the Adam optimizer on a wide array of architectures has made
it the default in settings where stochastic gradient descent (SGD) performs
poorly. However, our theoretical understanding of this discrepancy is lagging,
preventing the development of significant improvements on either algorithm.
Recent work advances the hypothesis that Adam and other heuristics like
gradient clipping outperform SGD on language tasks because the distribution of
the error induced by sampling has heavy tails. This suggests that Adam
outperform SGD because it uses a more robust gradient estimate. We evaluate
this hypothesis by varying the batch size, up to the entire dataset, to control
for stochasticity. We present evidence that stochasticity and heavy-tailed
noise are not major factors in the performance gap between SGD and Adam.
Rather, Adam performs better as the batch size increases, while SGD is less
effective at taking advantage of the reduction in noise. This raises the
question as to why Adam outperforms SGD in the full-batch setting. Through
further investigation of simpler variants of SGD, we find that the behavior of
Adam with large batches is similar to sign descent with momentum.
( 2
min )
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse
training library for machine learning research. JaxPruner aims to accelerate
research on sparse neural networks by providing concise implementations of
popular pruning and sparse training algorithms with minimal memory and latency
overhead. Algorithms implemented in JaxPruner use a common API and work
seamlessly with the popular optimization library Optax, which, in turn, enables
easy integration with existing JAX based libraries. We demonstrate this ease of
integration by providing examples in four different codebases: Scenic, t5x,
Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
( 2
min )
In recent years, there has been a surge in effort to formalize notions of
fairness in machine learning. We focus on clustering -- one of the fundamental
tasks in unsupervised machine learning. We propose a new axiom that captures
proportional representation fairness (PRF). We make a case that the concept
achieves the raison d'{\^{e}}tre of several existing concepts in the literature
in an arguably more convincing manner. Our fairness concept is not satisfied by
existing fair clustering algorithms. We design efficient algorithms to achieve
PRF both for unconstrained and discrete clustering problems.
( 2
min )
In documents and graphics, contours are a popular format to describe specific
shapes. For example, in the True Type Font (TTF) file format, contours describe
vector outlines of typeface shapes. Each contour is often defined as a sequence
of points. In this paper, we tackle the contour completion task. In this task,
the input is a contour sequence with missing points, and the output is a
generated completed contour. This task is more difficult than image completion
because, for images, the missing pixels are indicated. Since there is no such
indication in the contour completion task, we must solve the problem of missing
part detection and completion simultaneously. We propose a Transformer-based
method to solve this problem and show the results of the typeface contour
completion.
( 2
min )
This study aims to alleviate the trade-off between utility and privacy in the
task of differentially private clustering. Existing works focus on simple
clustering methods, which show poor clustering performance for non-convex
clusters. By utilizing Morse theory, we hierarchically connect the Gaussian
sub-clusters to fit complex cluster distributions. Because differentially
private sub-clusters are obtained through the existing methods, the proposed
method causes little or no additional privacy loss. We provide a theoretical
background that implies that the proposed method is inductive and can achieve
any desired number of clusters. Experiments on various datasets show that our
framework achieves better clustering performance at the same privacy level,
compared to the existing methods.
( 2
min )
In 1-bit matrix completion, the aim is to estimate an underlying low-rank
matrix from a partial set of binary observations. We propose a novel method for
1-bit matrix completion called MMGN. Our method is based on the
majorization-minimization (MM) principle, which yields a sequence of standard
low-rank matrix completion problems in our setting. We solve each of these
sub-problems by a factorization approach that explicitly enforces the assumed
low-rank structure and then apply a Gauss-Newton method. Our numerical studies
and application to a real-data example illustrate that MMGN outputs comparable
if not more accurate estimates, is often significantly faster, and is less
sensitive to the spikiness of the underlying matrix than existing methods.
( 2
min )
In recent years, product categorisation has been a common issue for
E-commerce companies who have utilised machine learning to categorise their
products automatically. In this study, we propose an ensemble approach, using a
combination of different models to separately predict each product's category,
subcategory, and colour before ultimately combining the resultant predictions
for each product. With the aforementioned approach, we show that an average
F1-score of 0.82 can be achieved using a combination of XGBoost and k-nearest
neighbours to predict said features.
( 2
min )
Innovative Electronic Design Automation (EDA) solutions are important to meet
the design requirements for increasingly complex electronic devices. Verilog, a
hardware description language, is widely used for the design and verification
of digital circuits and is synthesized using specific EDA tools. However,
writing code is a repetitive and time-intensive task. This paper proposes,
primarily, a novel deep learning framework for training a Verilog
autocompletion model and, secondarily, a Verilog dataset of files and snippets
obtained from open-source repositories. The framework involves integrating
models pretrained on general programming language data and finetuning them on a
dataset curated to be similar to a target downstream task. This is validated by
comparing different pretrained models trained on different subsets of the
proposed Verilog dataset using multiple evaluation metrics. These experiments
demonstrate that the proposed framework achieves better BLEU, ROUGE-L, and chrF
scores by 9.5%, 6.7%, and 6.9%, respectively, compared to a model trained from
scratch.
( 2
min )
While the use of the Internet of Things is becoming more and more popular,
many security vulnerabilities are emerging with the large number of devices
being introduced to the market. In this environment, IoT device identification
methods provide a preventive security measure as an important factor in
identifying these devices and detecting the vulnerabilities they suffer from.
In this study, we present a method that identifies devices in the Aalto dataset
using Long short-term memory (LSTM)
( 2
min )
In online forums like Reddit, users share their experiences with medical
conditions and treatments, including making claims, asking questions, and
discussing the effects of treatments on their health. Building systems to
understand this information can effectively monitor the spread of
misinformation and verify user claims. The Task-8 of the 2023 International
Workshop on Semantic Evaluation focused on medical applications, specifically
extracting patient experience- and medical condition-related entities from user
posts on social media. The Reddit Health Online Talk (RedHot) corpus contains
posts from medical condition-related subreddits with annotations characterizing
the patient experience and medical conditions. In Subtask-1, patient experience
is characterized by personal experience, questions, and claims. In Subtask-2,
medical conditions are characterized by population, intervention, and outcome.
For the automatic extraction of patient experiences and medical condition
information, as a part of the challenge, we proposed language-model-based
extraction systems that ranked $3^{rd}$ on both subtasks' leaderboards. In this
work, we describe our approach and, in addition, explore the automatic
extraction of this information using domain-specific language models and the
inclusion of external knowledge.
( 2
min )
This paper assesses the reliability of the RemOve-And-Retrain (ROAR)
protocol, which is used to measure the performance of feature importance
estimates. Our findings from the theoretical background and empirical
experiments indicate that attributions that possess less information about the
decision function can perform better in ROAR benchmarks, conflicting with the
original purpose of ROAR. This phenomenon is also observed in the recently
proposed variant RemOve-And-Debias (ROAD), and we propose a consistent trend of
blurriness bias in ROAR attribution metrics. Our results caution against
uncritical reliance on ROAR metrics.
( 2
min )
Current dialogue research primarily studies pairwise (two-party)
conversations, and does not address the everyday setting where more than two
speakers converse together. In this work, we both collect and evaluate
multi-party conversations to study this more general case. We use the LIGHT
environment to construct grounded conversations, where each participant has an
assigned character to role-play. We thus evaluate the ability of language
models to act as one or more characters in such conversations. Models require
two skills that pairwise-trained models appear to lack: (1) being able to
decide when to talk; (2) producing coherent utterances grounded on multiple
characters. We compare models trained on our new dataset to existing
pairwise-trained dialogue models, as well as large language models with
few-shot prompting. We find that our new dataset, MultiLIGHT, which we will
publicly release, can help bring significant improvements in the group setting.
( 2
min )
To improve the recognition ability of computer-aided breast mass
classification among mammographic images, in this work we explore the
state-of-the-art classification networks to develop an ensemble mechanism.
First, the regions of interest (ROIs) are obtained from the original dataset,
and then three models, i.e., XceptionNet, DenseNet, and EfficientNet, are
trained individually. After training, we ensemble the mechanism by summing the
probabilities outputted from each network which enhances the performance up to
5%. The scheme has been validated on a public dataset and we achieved accuracy,
precision, and recall 88%, 85%, and 76% respectively.
( 2
min )
Gradient-boosted decision trees (GBDT) are widely used and highly effective
machine learning approach for tabular data modeling. However, their complex
structure may lead to low robustness against small covariate perturbation in
unseen data. In this study, we apply one-hot encoding to convert a GBDT model
into a linear framework, through encoding of each tree leaf to one dummy
variable. This allows for the use of linear regression techniques, plus a novel
risk decomposition for assessing the robustness of a GBDT model against
covariate perturbations. We propose to enhance the robustness of GBDT models by
refitting their linear regression forms with $L_1$ or $L_2$ regularization.
Theoretical results are obtained about the effect of regularization on the
model performance and robustness. It is demonstrated through numerical
experiments that the proposed regularization approach can enhance the
robustness of the one-hot-encoded GBDT models.
( 2
min )
Annealed Importance Sampling (AIS) moves particles along a Markov chain from
a tractable initial distribution to an intractable target distribution. The
recently proposed Differentiable AIS (DAIS) (Geffner and Domke, 2021; Zhang et
al., 2021) enables efficient optimization of the transition kernels of AIS and
of the distributions. However, we observe a low effective sample size in DAIS,
indicating degenerate distributions. We thus propose to extend DAIS by a
resampling step inspired by Sequential Monte Carlo. Surprisingly, we find
empirically-and can explain theoretically-that it is not necessary to
differentiate through the resampling step which avoids gradient variance issues
observed in similar approaches for Particle Filters (Maddison et al., 2017;
Naesseth et al., 2018; Le et al., 2018).
( 2
min )
We detail an approach to develop Stein's method for bounding integral metrics
on probability measures defined on a Riemannian manifold $\mathbf M$. Our
approach exploits the relationship between the generator of a diffusion on
$\mathbf M$ with target invariant measure and its characterising Stein
operator. We consider a pair of such diffusions with different starting points,
and through analysis of the distance process between the pair, derive Stein
factors, which bound the solution to the Stein equation and its derivatives.
The Stein factors contain curvature-dependent terms and reduce to those
currently available for $\mathbb R^m$, and moreover imply that the bounds for
$\mathbb R^m$ remain valid when $\mathbf M$ is a flat manifold
( 2
min )
Conditional Average Treatment Effects (CATE) estimation is one of the main
challenges in causal inference with observational data. In addition to Machine
Learning based-models, nonparametric estimators called meta-learners have been
developed to estimate the CATE with the main advantage of not restraining the
estimation to a specific supervised learning method. This task becomes,
however, more complicated when the treatment is not binary as some limitations
of the naive extensions emerge. This paper looks into meta-learners for
estimating the heterogeneous effects of multi-valued treatments. We consider
different meta-learners, and we carry out a theoretical analysis of their error
upper bounds as functions of important parameters such as the number of
treatment levels, showing that the naive extensions do not always provide
satisfactory results. We introduce and discuss meta-learners that perform well
as the number of treatments increases. We empirically confirm the strengths and
weaknesses of those methods with synthetic and semi-synthetic datasets.
( 2
min )
We introduce a new computational framework for estimating parameters in
generalized generalized linear models (GGLM), a class of models that extends
the popular generalized linear models (GLM) to account for dependencies among
observations in spatio-temporal data. The proposed approach uses a monotone
operator-based variational inequality method to overcome non-convexity in
parameter estimation and provide guarantees for parameter recovery. The results
can be applied to GLM and GGLM, focusing on spatio-temporal models. We also
present online instance-based bounds using martingale concentrations
inequalities. Finally, we demonstrate the performance of the algorithm using
numerical simulations and a real data example for wildfire incidents.
( 2
min )
Variational Bayes is a popular method for approximate inference but its
derivation can be cumbersome. To simplify the process, we give a 3-step recipe
to identify the posterior form by explicitly looking for linearity with respect
to expectations of well-known distributions. We can then directly write the
update by simply ``reading-off'' the terms in front of those expectations. The
recipe makes the derivation easier, faster, shorter, and more general.
( 2
min )
In 1-bit matrix completion, the aim is to estimate an underlying low-rank
matrix from a partial set of binary observations. We propose a novel method for
1-bit matrix completion called MMGN. Our method is based on the
majorization-minimization (MM) principle, which yields a sequence of standard
low-rank matrix completion problems in our setting. We solve each of these
sub-problems by a factorization approach that explicitly enforces the assumed
low-rank structure and then apply a Gauss-Newton method. Our numerical studies
and application to a real-data example illustrate that MMGN outputs comparable
if not more accurate estimates, is often significantly faster, and is less
sensitive to the spikiness of the underlying matrix than existing methods.
( 2
min )
This tutorial survey provides an overview of recent non-asymptotic advances
in statistical learning theory as relevant to control and system
identification. While there has been substantial progress across all areas of
control, the theory is most well-developed when it comes to linear system
identification and learning for the linear quadratic regulator, which are the
focus of this manuscript. From a theoretical perspective, much of the labor
underlying these advances has been in adapting tools from modern
high-dimensional statistics and learning theory. While highly relevant to
control theorists interested in integrating tools from machine learning, the
foundational material has not always been easily accessible. To remedy this, we
provide a self-contained presentation of the relevant material, outlining all
the key ideas and the technical machinery that underpin recent results. We also
present a number of open problems and future directions.
( 2
min )
Message Passing Neural Networks (MPNNs) are instances of Graph Neural
Networks that leverage the graph to send messages over the edges. This
inductive bias leads to a phenomenon known as over-squashing, where a node
feature is insensitive to information contained at distant nodes. Despite
recent methods introduced to mitigate this issue, an understanding of the
causes for over-squashing and of possible solutions are lacking. In this
theoretical work, we prove that: (i) Neural network width can mitigate
over-squashing, but at the cost of making the whole network more sensitive;
(ii) Conversely, depth cannot help mitigate over-squashing: increasing the
number of layers leads to over-squashing being dominated by vanishing
gradients; (iii) The graph topology plays the greatest role, since
over-squashing occurs between nodes at high commute (access) time. Our analysis
provides a unified framework to study different recent methods introduced to
cope with over-squashing and serves as a justification for a class of methods
that fall under `graph rewiring'.
( 2
min )
The learnable, linear neural network layers between tensor power spaces of
$\mathbb{R}^{n}$ that are equivariant to the orthogonal group, $O(n)$, the
special orthogonal group, $SO(n)$, and the symplectic group, $Sp(n)$, were
characterised in arXiv:2212.08630. We present an algorithm for multiplying a
vector by any weight matrix for each of these groups, using category theoretic
constructions to implement the procedure. We achieve a significant reduction in
computational cost compared with a naive implementation by making use of
Kronecker product matrices to perform the multiplication. We show that our
approach extends to the symmetric group, $S_n$, recovering the algorithm of
arXiv:2303.06208 in the process.
( 2
min )
We study variance-dependent regret bounds for Markov decision processes
(MDPs). Algorithms with variance-dependent regret guarantees can automatically
exploit environments with low variance (e.g., enjoying constant regret on
deterministic MDPs). The existing algorithms are either variance-independent or
suboptimal. We first propose two new environment norms to characterize the
fine-grained variance properties of the environment. For model-based methods,
we design a variant of the MVP algorithm (Zhang et al., 2021a) and use new
analysis techniques show to this algorithm enjoys variance-dependent bounds
with respect to our proposed norms. In particular, this bound is simultaneously
minimax optimal for both stochastic and deterministic MDPs, the first result of
its kind. We further initiate the study on model-free algorithms with
variance-dependent regret bounds by designing a reference-function-based
algorithm with a novel capped-doubling reference update schedule. Lastly, we
also provide lower bounds to complement our upper bounds.
( 2
min )
Many techniques in machine learning attempt explicitly or implicitly to infer
a low-dimensional manifold structure of an underlying physical phenomenon from
measurements without an explicit model of the phenomenon or the measurement
apparatus. This paper presents a cautionary tale regarding the discrepancy
between the geometry of measurements and the geometry of the underlying
phenomenon in a benign setting. The deformation in the metric illustrated in
this paper is mathematically straightforward and unavoidable in the general
case, and it is only one of several similar effects. While this is not always
problematic, we provide an example of an arguably standard and harmless data
processing procedure where this effect leads to an incorrect answer to a
seemingly simple question. Although we focus on manifold learning, these issues
apply broadly to dimensionality reduction and unsupervised learning.
( 2
min )
Wartella and AI reinvigorate a White Stripes classic, exploring AI’s role in music video creation.
( 6
min )
Source Manufacturers often turn to digitalization strategies to improve their competitiveness, address labor shortages, and boost productivity. These efforts are driven by a desire to stay ahead of the game rather than simply defend against the competition. However, moving to the front foot regarding generated data unlocks waves of innovation — creating fast, bold, competitive,… Read More »3 Major Benefits Data Collection Brings To The Manufacturing Process
The post 3 Major Benefits Data Collection Brings To The Manufacturing Process appeared first on Data Science Central.
( 21
min )
I recently completed teaching my “Big Data MBA: Thinking Like a Data Scientist (TLADS)” class for the spring semester at Iowa State University. I had 17 second-year MBA students, and their diligence, passion, and creativity were evident throughout the semester and especially in the final project presentations. This class had no tests or mid-term exams… Read More »Iowa State University: “Thinking Like a Data Scientist” Lessons Learned
The post Iowa State University: “Thinking Like a Data Scientist” Lessons Learned appeared first on Data Science Central.
( 21
min )
A new method could provide detailed information about internal structures, voids, and cracks, based solely on data about exterior conditions.
( 10
min )
Recent large language models (LLMs) have enabled tremendous progress in natural language understanding. However, they are prone to generating confident but nonsensical explanations, which poses a significant obstacle to establishing trust with users. In this post, we show how to incorporate human feedback on the incorrect reasoning chains for multi-hop reasoning to improve performance on […]
( 10
min )
Deep learning (DL) is a fast-evolving field, and practitioners are constantly innovating DL models and inventing ways to speed them up. Custom operators are one of the mechanisms developers use to push the boundaries of DL innovation by extending the functionality of existing machine learning (ML) frameworks such as PyTorch. In general, an operator describes […]
( 11
min )
Horror descends from the cloud this GFN Thursday with the arrival of publisher Capcom’s iconic Resident Evil series. They’re part of nine new games expanding the GeForce NOW library of over 1,600 titles. RTX 4080 SuperPODs are now live in Miami, Portland, Ore., and Stockholm. Follow along with the server rollout process, and make the Read article >
( 4
min )
We employ unsupervised machine learning to enhance the accuracy of our
recently presented scaling method for wave confinement analysis [1]. We employ
the standard k-means++ algorithm as well as our own model-based algorithm. We
investigate cluster validity indices as a means to find the correct number of
confinement dimensionalities to be used as an input to the clustering
algorithms. Subsequently, we analyze the performance of the two clustering
algorithms when compared to the direct application of the scaling method
without clustering. We find that the clustering approach provides more
physically meaningful results, but may struggle with identifying the correct
set of confinement dimensionalities. We conclude that the most accurate outcome
is obtained by first applying the direct scaling to find the correct set of
confinement dimensionalities and subsequently employing clustering to refine
the results. Moreover, our model-based algorithm outperforms the standard
k-means++ clustering.
( 2
min )
We identify and explore connections between the recent literature on
multi-group fairness for prediction algorithms and the pseudorandomness notions
of leakage-resilience and graph regularity. We frame our investigation using
new, statistical distance-based variants of multicalibration that are closely
related to the concept of outcome indistinguishability. Adopting this
perspective leads us naturally not only to our graph theoretic results, but
also to new, more efficient algorithms for multicalibration in certain
parameter regimes and a novel proof of a hardcore lemma for real-valued
functions.
( 2
min )
In the paper, we propose a novel approach for solving Bayesian inverse
problems with physics-informed invertible neural networks (PI-INN). The
architecture of PI-INN consists of two sub-networks: an invertible neural
network (INN) and a neural basis network (NB-Net). The invertible map between
the parametric input and the INN output with the aid of NB-Net is constructed
to provide a tractable estimation of the posterior distribution, which enables
efficient sampling and accurate density evaluation. Furthermore, the loss
function of PI-INN includes two components: a residual-based physics-informed
loss term and a new independence loss term. The presented independence loss
term can Gaussianize the random latent variables and ensure statistical
independence between two parts of INN output by effectively utilizing the
estimated density function. Several numerical experiments are presented to
demonstrate the efficiency and accuracy of the proposed PI-INN, including
inverse kinematics, inverse problems of the 1-d and 2-d diffusion equations,
and seismic traveltime tomography.
( 2
min )
Previous studies have shown that leveraging domain index can significantly
boost domain adaptation performance (arXiv:2007.01807, arXiv:2202.03628).
However, such domain indices are not always available. To address this
challenge, we first provide a formal definition of domain index from the
probabilistic perspective, and then propose an adversarial variational Bayesian
framework that infers domain indices from multi-domain data, thereby providing
additional insight on domain relations and improving domain adaptation
performance. Our theoretical analysis shows that our adversarial variational
Bayesian framework finds the optimal domain index at equilibrium. Empirical
results on both synthetic and real data verify that our model can produce
interpretable domain indices which enable us to achieve superior performance
compared to state-of-the-art domain adaptation methods. Code is available at
https://github.com/Wang-ML-Lab/VDI.
( 2
min )
Modern machine learning systems are increasingly trained on large amounts of
data embedded in high-dimensional spaces. Often this is done without analyzing
the structure of the dataset. In this work, we propose a framework to study the
geometric structure of the data. We make use of our recently introduced
non-negative kernel (NNK) regression graphs to estimate the point density,
intrinsic dimension, and the linearity of the data manifold (curvature). We
further generalize the graph construction and geometric estimation to multiple
scale by iteratively merging neighborhoods in the input data. Our experiments
demonstrate the effectiveness of our proposed approach over other baselines in
estimating the local geometry of the data manifolds on synthetic and real
datasets.
( 2
min )
Motor brain-computer interface (BCI) development relies critically on neural
time series decoding algorithms. Recent advances in deep learning architectures
allow for automatic feature selection to approximate higher-order dependencies
in data. This article presents the FingerFlex model - a convolutional
encoder-decoder architecture adapted for finger movement regression on
electrocorticographic (ECoG) brain data. State-of-the-art performance was
achieved on a publicly available BCI competition IV dataset 4 with a
correlation coefficient between true and predicted trajectories up to 0.74. The
presented method provides the opportunity for developing fully-functional
high-precision cortical motor brain-computer interfaces.
( 2
min )
Hardware Trojans (HTs) are undesired design or manufacturing modifications
that can severely alter the security and functionality of digital integrated
circuits. HTs can be inserted according to various design criteria, e.g., nets
switching activity, observability, controllability, etc. However, to our
knowledge, most HT detection methods are only based on a single criterion,
i.e., nets switching activity. This paper proposes a multi-criteria
reinforcement learning (RL) HT detection tool that features a tunable reward
function for different HT detection scenarios. The tool allows for exploring
existing detection strategies and can adapt new detection scenarios with
minimal effort. We also propose a generic methodology for comparing HT
detection methods fairly. Our preliminary results show an average of 84.2%
successful HT detection in ISCAS-85 benchmark
( 2
min )
The proposed BSDE-based diffusion model represents a novel approach to
diffusion modeling, which extends the application of stochastic differential
equations (SDEs) in machine learning. Unlike traditional SDE-based diffusion
models, our model can determine the initial conditions necessary to reach a
desired terminal distribution by adapting an existing score function. We
demonstrate the theoretical guarantees of the model, the benefits of using
Lipschitz networks for score matching, and its potential applications in
various areas such as diffusion inversion, conditional diffusion, and
uncertainty quantification. Our work represents a contribution to the field of
score-based generative learning and offers a promising direction for solving
real-world problems.
( 2
min )
In this paper we present the Zeitview Rooftop Geometry (ZRG) dataset. ZRG
contains thousands of samples of high resolution orthomosaics of aerial imagery
of residential rooftops with corresponding digital surface models (DSM), 3D
rooftop wireframes, and multiview imagery generated point clouds for the
purpose of residential rooftop geometry and scene understanding. We perform
thorough benchmarks to illustrate the numerous applications unlocked by this
dataset and provide baselines for the tasks of roof outline extraction,
monocular height estimation, and planar roof structure extraction.
( 2
min )
We adapt reinforcement learning (RL) methods for continuous control to bridge
the gap between complete ignorance and perfect knowledge of the environment.
Our method, Partial Knowledge Least Squares Policy Iteration (PLSPI), takes
inspiration from both model-free RL and model-based control. It uses incomplete
information from a partial model and retains RL's data-driven adaption towards
optimal performance. The linear quadratic regulator provides a case study;
numerical experiments demonstrate the effectiveness and resulting benefits of
the proposed method.
( 2
min )
In this study, toward addressing the over-confident outputs of existing
artificial intelligence-based colorectal cancer (CRC) polyp classification
techniques, we propose a confidence-calibrated residual neural network.
Utilizing a novel vision-based tactile sensing (VS-TS) system and unique CRC
polyp phantoms, we demonstrate that traditional metrics such as accuracy and
precision are not sufficient to encapsulate model performance for handling a
sensitive CRC polyp diagnosis. To this end, we develop a residual neural
network classifier and address its over-confident outputs for CRC polyps
classification via the post-processing method of temperature scaling. To
evaluate the proposed method, we introduce noise and blur to the obtained
textural images of the VS-TS and test the model's reliability for non-ideal
inputs through reliability diagrams and other statistical metrics.
( 2
min )
Accurate detection of human presence in indoor environments is important for
various applications, such as energy management and security. In this paper, we
propose a novel system for human presence detection using the channel state
information (CSI) of WiFi signals. Our system named attention-enhanced deep
learning for presence detection (ALPD) employs an attention mechanism to
automatically select informative subcarriers from the CSI data and a
bidirectional long short-term memory (LSTM) network to capture temporal
dependencies in CSI. Additionally, we utilize a static feature to improve the
accuracy of human presence detection in static states. We evaluate the proposed
ALPD system by deploying a pair of WiFi access points (APs) for collecting CSI
dataset, which is further compared with several benchmarks. The results
demonstrate that our ALPD system outperforms the benchmarks in terms of
accuracy, especially in the presence of interference. Moreover, bidirectional
transmission data is beneficial to training improving stability and accuracy,
as well as reducing the costs of data collection for training. Overall, our
proposed ALPD system shows promising results for human presence detection using
WiFi CSI signals.
( 2
min )
The challenges faced by text classification with large tag systems in natural
language processing tasks include multiple tag systems, uneven data
distribution, and high noise. To address these problems, the ESimCSE
unsupervised comparative learning and UDA semi-supervised comparative learning
models are combined through the use of joint training techniques in the
models.The ESimCSE model efficiently learns text vector representations using
unlabeled data to achieve better classification results, while UDA is trained
using unlabeled data through semi-supervised learning methods to improve the
prediction performance of the models and stability, and further improve the
generalization ability of the model. In addition, adversarial training
techniques FGM and PGD are used in the model training process to improve the
robustness and reliability of the model. The experimental results show that
there is an 8% and 10% accuracy improvement relative to Baseline on the public
dataset Ruesters as well as on the operational dataset, respectively, and a 15%
improvement in manual validation accuracy can be achieved on the operational
dataset, indicating that the method is effective.
( 2
min )
We propose an experimental scheme for performing sensitive, high-precision
laser spectroscopy studies on fast exotic isotopes. By inducing a step-wise
resonant ionization of the atoms travelling inside an electric field and
subsequently detecting the ion and the corresponding electron, time- and
position-sensitive measurements of the resulting particles can be performed.
Using a Mixture Density Network (MDN), we can leverage this information to
predict the initial energy of individual atoms and thus apply a Doppler
correction of the observed transition frequencies on an event-by-event basis.
We conduct numerical simulations of the proposed experimental scheme and show
that kHz-level uncertainties can be achieved for ion beams produced at extreme
temperatures ($> 10^8$ K), with energy spreads as large as $10$ keV and
non-uniform velocity distributions. The ability to perform in-flight
spectroscopy, directly on highly energetic beams, offers unique opportunities
to studying short-lived isotopes with lifetimes in the millisecond range and
below, produced in low quantities, in hot and highly contaminated environments,
without the need for cooling techniques. Such species are of marked interest
for nuclear structure, astrophysics, and new physics searches.
( 2
min )
In this paper, we introduce a new nonlinear channel equalization method for
the coherent long-haul transmission based on Transformers. We show that due to
their capability to attend directly to the memory across a sequence of symbols,
Transformers can be used effectively with a parallelized structure. We present
an implementation of encoder part of Transformer for nonlinear equalization and
analyze its performance over a wide range of different hyper-parameters. It is
shown that by processing blocks of symbols at each iteration and carefully
selecting subsets of the encoder's output to be processed together, an
efficient nonlinear compensation can be achieved. We also propose the use of a
physic-informed mask inspired by nonlinear perturbation theory for reducing the
computational complexity of Transformer nonlinear equalization.
( 2
min )
In this paper, we investigate the robustness of an LSTM neural network
against noise injection attacks for electric load forecasting in an ideal
microgrid. The performance of the LSTM model is investigated under a black-box
Gaussian noise attack with different SNRs. It is assumed that attackers have
just access to the input data of the LSTM model. The results show that the
noise attack affects the performance of the LSTM model. The load prediction
means absolute error (MAE) is 0.047 MW for a healthy prediction, while this
value increases up to 0.097 MW for a Gaussian noise insertion with SNR= 6 dB.
To robustify the LSTM model against noise attack, a low-pass filter with
optimal cut-off frequency is applied at the model's input to remove the noise
attack. The filter performs better in case of noise with lower SNR and is less
promising for small noises.
( 2
min )
The average treatment effect, which is the difference in expectation of the
counterfactuals, is probably the most popular target effect in causal inference
with binary treatments. However, treatments may have effects beyond the mean,
for instance decreasing or increasing the variance. We propose a new
kernel-based test for distributional effects of the treatment. It is, to the
best of our knowledge, the first kernel-based, doubly-robust test with provably
valid type-I error. Furthermore, our proposed algorithm is efficient, avoiding
the use of permutations.
( 2
min )
Focusing on stochastic programming (SP) with covariate information, this
paper proposes an empirical risk minimization (ERM) method embedded within a
nonconvex piecewise affine decision rule (PADR), which aims to learn the direct
mapping from features to optimal decisions. We establish the nonasymptotic
consistency result of our PADR-based ERM model for unconstrained problems and
asymptotic consistency result for constrained ones. To solve the nonconvex and
nondifferentiable ERM problem, we develop an enhanced stochastic
majorization-minimization algorithm and establish the asymptotic convergence to
(composite strong) directional stationarity along with complexity analysis. We
show that the proposed PADR-based ERM method applies to a broad class of
nonconvex SP problems with theoretical consistency guarantees and computational
tractability. Our numerical study demonstrates the superior performance of
PADR-based ERM methods compared to state-of-the-art approaches under various
settings, with significantly lower costs, less computation time, and robustness
to feature dimensions and nonlinearity of the underlying dependency.
( 2
min )
We study Langevin-type algorithms for sampling from Gibbs distributions such
that the potentials are dissipative and their weak gradients have finite moduli
of continuity not necessarily convergent to zero. Our main result is a
non-asymptotic upper bound of the 2-Wasserstein distance between the Gibbs
distribution and the law of general Langevin-type algorithms based on the
Liptser--Shiryaev theory and Poincar\'{e} inequalities. We apply this bound to
show that the Langevin Monte Carlo algorithm can approximate Gibbs
distributions with arbitrary accuracy if the potentials are dissipative and
their gradients are uniformly continuous. We also propose Langevin-type
algorithms with spherical smoothing for potentials without convexity or
continuous differentiability.
( 2
min )
We describe a direct approach to estimate bipartite mutual information of a
classical spin system based on Monte Carlo sampling enhanced by autoregressive
neural networks. It allows studying arbitrary geometries of subsystems and can
be generalized to classical field theories. We demonstrate it on the Ising
model for four partitionings, including a multiply-connected even-odd division.
We show that the area law is satisfied for temperatures away from the critical
temperature: the constant term is universal, whereas the proportionality
coefficient is different for the even-odd partitioning.
( 2
min )
The Hierarchical Vote Collective of Transformation-based Ensembles
(HIVE-COTE) is a heterogeneous meta ensemble for time series classification.
Since it was first proposed in 2016, the algorithm has undergone some minor
changes and there is now a configurable, scalable and easy to use version
available in two open source repositories. We present an overview of the latest
stable HIVE-COTE, version 1.0, and describe how it differs to the original. We
provide a walkthrough guide of how to use the classifier, and conduct extensive
experimental evaluation of its predictive performance and resource usage. We
compare the performance of HIVE-COTE to three recently proposed algorithms
using the aeon toolkit.
( 2
min )
Forecast reconciliation is an important research topic. Yet, there is
currently neither formal framework nor practical method for the probabilistic
reconciliation of count time series. In this paper we propose a definition of
coherency and reconciled probabilistic forecast which applies to both
real-valued and count variables and a novel method for probabilistic
reconciliation. It is based on a generalization of Bayes' rule and it can
reconcile both real-value and count variables. When applied to count variables,
it yields a reconciled probability mass function. Our experiments with the
temporal reconciliation of count variables show a major forecast improvement
compared to the probabilistic Gaussian reconciliation.
( 2
min )
In this work, we study the performance of the Thompson Sampling algorithm for
Contextual Bandit problems based on the framework introduced by Neu et al. and
their concept of lifted information ratio. First, we prove a comprehensive
bound on the Thompson Sampling expected cumulative regret that depends on the
mutual information of the environment parameters and the history. Then, we
introduce new bounds on the lifted information ratio that hold for sub-Gaussian
rewards, thus generalizing the results from Neu et al. which analysis requires
binary rewards. Finally, we provide explicit regret bounds for the special
cases of unstructured bounded contextual bandits, structured bounded contextual
bandits with Laplace likelihood, structured Bernoulli bandits, and bounded
linear contextual bandits.
( 2
min )
In this article, we will research the Recommender System's implementation
about how it works and the algorithms used. We will explain the Recommender
System's algorithms based on mathematical principles, and find feasible methods
for improvements. The algorithms based on probability have its significance in
Recommender System, we will describe how they help to increase the accuracy and
speed of the algorithms. Both the weakness and the strength of two different
mathematical distance used to describe the similarity will be detailed
illustrated in this article.
( 2
min )
Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? You’re not alone. Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production. This post describes how to implement your first ML use case using Amazon […]
( 9
min )
Spotlighted by this week’s In the NVIDIA Studio featured artist Unmesh Dinda, NVIDIA Broadcast transforms the homes, apartments and dorm rooms of content creators, livestreamers and people working from home through the power of AI — all without the need for specialized equipment.
( 7
min )
Imagine a future where your vehicle’s interior offers personalized experiences and builds trust through human-machine interfaces (HMI) and AI. In this episode of the NVIDIA AI Podcast, Andreas Binner, chief technology officer at Rightware, delves into this fascinating topic with host Katie Burke Washabaugh. Rightware is a Helsinki-based company at the forefront of developing in-vehicle Read article >
( 5
min )
We recently introduced a new capability in the Amazon SageMaker Python SDK that lets data scientists run their machine learning (ML) code authored in their preferred integrated developer environment (IDE) and notebooks along with the associated runtime dependencies as Amazon SageMaker training jobs with minimal code changes to the experimentation done locally. Data scientists typically […]
( 13
min )
Many organizations use Gmail for their business email needs. Gmail for Business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Google Docs, Google Sheets, and more. For any organization, emails contain a wealth of information, which could be within the subject of an email, the message […]
( 9
min )
Announcements Tech Layoffs and Uncertainty Raise Big Questions for Higher Education Mass layoffs continue across the tech industry, with tens of thousands of workers losing their jobs in the first quarter of 2023. The reductions occurred from small startups to the biggest names in tech — Google, Amazon, Microsoft. Core technical roles such as data… Read More »DSC Weekly 25 April 2023 – Tech Layoffs and Uncertainty Raise Big Questions for Higher Education
The post DSC Weekly 25 April 2023 – Tech Layoffs and Uncertainty Raise Big Questions for Higher Education appeared first on Data Science Central.
( 19
min )
Internal CPU Accelerators and HBM Enable Faster and Smarter HPC and AI Applications We have now entered the era when processor designers can leverage modular semiconductor manufacturing capabilities to speed frequently performed operations (such as small tensor operations) and offload a variety of housekeeping tasks (such as copying and zeroing memory) to dedicated on-chip accelerators. The… Read More »Internal CPU Accelerators and HBM Enable Faster and Smarter HPC and AI Applications
The post Internal CPU Accelerators and HBM Enable Faster and Smarter HPC and AI Applications appeared first on Data Science Central.
( 33
min )
Newly released open-source software can help developers guide generative AI applications to create impressive text responses that stay on track. NeMo Guardrails will help ensure smart applications powered by large language models (LLMs) are accurate, appropriate, on topic and secure. The software includes all the code, examples and documentation businesses need to add safety to Read article >
( 6
min )
ChatGPT users can now turn off chat history, allowing you to choose which conversations can be used to train our models.
( 2
min )
In the world of machine learning (ML), the quality of the dataset is of significant importance to model predictability. Although more data is usually better, large datasets with a high number of features can sometimes lead to non-optimal model performance due to the curse of dimensionality. Analysts can spend a significant amount of time transforming […]
( 9
min )
According to a PWC report, 32% of retail customers churn after one negative experience, and 73% of customers say that customer experience influences their purchase decisions. In the global retail industry, pre- and post-sales support are both important aspects of customer care. Numerous methods, including email, live chat, bots, and phone calls, are used to […]
( 8
min )
TLA+ is a high level, open-source, math-based language for modeling computer programs and systems–especially concurrent and distributed ones. It comes with tools to help eliminate fundamental design errors, which are hard to find and expensive to fix once they have been embedded in code or hardware. The TLA language was first published in 1993 by the […]
The post TLA+ Foundation aims to bring math-based software modeling to the mainstream appeared first on Microsoft Research.
( 9
min )
Along with Markov chain Monte Carlo (MCMC) methods, variational inference
(VI) has emerged as a central computational approach to large-scale Bayesian
inference. Rather than sampling from the true posterior $\pi$, VI aims at
producing a simple but effective approximation $\hat \pi$ to $\pi$ for which
summary statistics are easy to compute. However, unlike the well-studied MCMC
methodology, algorithmic guarantees for VI are still relatively less
well-understood. In this work, we propose principled methods for VI, in which
$\hat \pi$ is taken to be a Gaussian or a mixture of Gaussians, which rest upon
the theory of gradient flows on the Bures--Wasserstein space of Gaussian
measures. Akin to MCMC, it comes with strong theoretical guarantees when $\pi$
is log-concave.
( 2
min )
We consider using gradient descent to minimize the nonconvex function
$f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is
an underlying smooth convex cost function defined over $n\times n$ matrices.
While only a second-order stationary point $X$ can be provably found in
reasonable time, if $X$ is additionally rank deficient, then its rank
deficiency certifies it as being globally optimal. This way of certifying
global optimality necessarily requires the search rank $r$ of the current
iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the
global minimizer $X^{\star}$. Unfortunately, overparameterization significantly
slows down the convergence of gradient descent, from a linear rate with
$r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is
strongly convex. In this paper, we propose an inexpensive preconditioner that
restores the convergence rate of gradient descent back to linear in the
overparameterized case, while also making it agnostic to possible
ill-conditioning in the global minimizer $X^{\star}$.
( 2
min )
Neutron scattering experiments at three-axes spectrometers (TAS) investigate
magnetic and lattice excitations by measuring intensity distributions to
understand the origins of materials properties. The high demand and limited
availability of beam time for TAS experiments however raise the natural
question whether we can improve their efficiency and make better use of the
experimenter's time. In fact, there are a number of scientific problems that
require searching for signals, which may be time consuming and inefficient if
done manually due to measurements in uninformative regions. Here, we describe a
probabilistic active learning approach that not only runs autonomously, i.e.,
without human interference, but can also directly provide locations for
informative measurements in a mathematically sound and methodologically robust
way by exploiting log-Gaussian processes. Ultimately, the resulting benefits
can be demonstrated on a real TAS experiment and a benchmark including numerous
different excitations.
( 2
min )
Conservative inference is a major concern in simulation-based inference. It
has been shown that commonly used algorithms can produce overconfident
posterior approximations. Balancing has empirically proven to be an effective
way to mitigate this issue. However, its application remains limited to neural
ratio estimation. In this work, we extend balancing to any algorithm that
provides a posterior density. In particular, we introduce a balanced version of
both neural posterior estimation and contrastive neural ratio estimation. We
show empirically that the balanced versions tend to produce conservative
posterior approximations on a wide variety of benchmarks. In addition, we
provide an alternative interpretation of the balancing condition in terms of
the $\chi^2$ divergence.
( 2
min )
Recent breakthroughs in NLP largely increased the presence of ASR systems in
our daily lives. However, for many low-resource languages, ASR models still
need to be improved due in part to the difficulty of acquiring pertinent data.
This project aims to help advance research in ASR models for Swiss German
dialects, by providing insights about the performance of state-of-the-art ASR
models on recently published Swiss German speech datasets. We propose a novel
loss that takes into account the semantic distance between the predicted and
the ground-truth labels. We outperform current state-of-the-art results by
fine-tuning OpenAI's Whisper model on Swiss-German datasets.
( 2
min )
This article presents a leak localization methodology based on state
estimation and learning. The first is handled by an interpolation scheme,
whereas dictionary learning is considered for the second stage. The novel
proposed interpolation technique exploits the physics of the interconnections
between hydraulic heads of neighboring nodes in water distribution networks.
Additionally, residuals are directly interpolated instead of hydraulic head
values. The results of applying the proposed method to a well-known case study
(Modena) demonstrated the improvements of the new interpolation method with
respect to a state-of-the-art approach, both in terms of interpolation error
(considering state and residual estimation) and posterior localization.
( 2
min )
The use of machine learning (ML) inference for various applications is
growing drastically. ML inference services engage with users directly,
requiring fast and accurate responses. Moreover, these services face dynamic
workloads of requests, imposing changes in their computing resources. Failing
to right-size computing resources results in either latency service level
objectives (SLOs) violations or wasted computing resources. Adapting to dynamic
workloads considering all the pillars of accuracy, latency, and resource cost
is challenging. In response to these challenges, we propose InfAdapter, which
proactively selects a set of ML model variants with their resource allocations
to meet latency SLO while maximizing an objective function composed of accuracy
and cost. InfAdapter decreases SLO violation and costs up to 65% and 33%,
respectively, compared to a popular industry autoscaler (Kubernetes Vertical
Pod Autoscaler).
( 2
min )
Deploying machine learning models in production may allow adversaries to
infer sensitive information about training data. There is a vast literature
analyzing different types of inference risks, ranging from membership inference
to reconstruction attacks. Inspired by the success of games (i.e.,
probabilistic experiments) to study security properties in cryptography, some
authors describe privacy inference risks in machine learning using a similar
game-based style. However, adversary capabilities and goals are often stated in
subtly different ways from one presentation to the other, which makes it hard
to relate and compose results. In this paper, we present a game-based framework
to systematize the body of knowledge on privacy inference risks in machine
learning. We use this framework to (1) provide a unifying structure for
definitions of inference risks, (2) formally establish known relations among
definitions, and (3) to uncover hitherto unknown relations that would have been
difficult to spot otherwise.
( 2
min )
Hyperparameter optimization (HPO) is crucial for strong performance of deep
learning algorithms and real-world applications often impose some constraints,
such as memory usage, or latency on top of the performance requirement. In this
work, we propose constrained TPE (c-TPE), an extension of the widely-used
versatile Bayesian optimization method, tree-structured Parzen estimator (TPE),
to handle these constraints. Our proposed extension goes beyond a simple
combination of an existing acquisition function and the original TPE, and
instead includes modifications that address issues that cause poor performance.
We thoroughly analyze these modifications both empirically and theoretically,
providing insights into how they effectively overcome these challenges. In the
experiments, we demonstrate that c-TPE exhibits the best average rank
performance among existing methods with statistical significance on 81
expensive HPO settings.
( 2
min )
How do you scale a machine learning product at a startup? In particular, how
do you serve a greater volume, velocity, and variety of queries
cost-effectively? We break down costs into variable costs-the cost of serving
the model and performant-and fixed costs-the cost of developing and training
new models. We propose a framework for conceptualizing these costs, breaking
them into finer categories, and limn ways to reduce costs. Lastly, since in our
experience, the most expensive fixed cost of a machine learning system is the
cost of identifying the root causes of failures and driving continuous
improvement, we present a way to conceptualize the issues and share our
methodology for the same.
( 2
min )
We introduce a novel self-attention mechanism, which we call CSA (Chromatic
Self-Attention), which extends the notion of attention scores to attention
_filters_, independently modulating the feature channels. We showcase CSA in a
fully-attentional graph Transformer CGT (Chromatic Graph Transformer) which
integrates both graph structural information and edge features, completely
bypassing the need for local message-passing components. Our method flexibly
encodes graph structure through node-node interactions, by enriching the
original edge features with a relative positional encoding scheme. We propose a
new scheme based on random walks that encodes both structural and positional
information, and show how to incorporate higher-order topological information,
such as rings in molecular graphs. Our approach achieves state-of-the-art
results on the ZINC benchmark dataset, while providing a flexible framework for
encoding graph structure and incorporating higher-order topology.
( 2
min )
This article presents an identification benchmark based on data from a public
swimming pool in operation. Such a system is both a complex process and easily
understandable by all with regard to the stakes. Ultimately, the objective is
to reduce the energy bill while maintaining the level of quality of service.
This objective is general in scope and is not limited to public swimming pools.
This can be done effectively through what is known as economic predictive
control. This type of advanced control is based on a process model. It is the
aim of this article and the considered benchmark to show that such a dynamic
model can be obtained from operating data. For this, operational data is
formatted and shared, and model quality indicators are proposed. On this basis,
the first identification results illustrate the results obtained by a linear
multivariable model on the one hand, and by a neural dynamic model on the other
hand. The benchmark calls for other proposals and results from control and data
scientists for comparison.
( 2
min )
This short note describes and proves a connectedness property which was
introduced in Blocher et al. [2023] in the context of data depth functions for
partial orders. The connectedness property gives a structural insight into
union-free generic sets. These sets, presented in Blocher et al. [2023], are
defined by using a closure operator on the set of all partial orders which
naturally appears within the theory of formal concept analysis. In the language
of formal concept analysis, the property of connectedness can be vividly
proven. However, since within Blocher et al. [2023] we did not discuss formal
concept analysis, we outsourced the proof to this note.
( 2
min )
Exploration is a fundamental aspect of reinforcement learning (RL), and its
effectiveness crucially decides the performance of RL algorithms, especially
when facing sparse extrinsic rewards. Recent studies showed the effectiveness
of encouraging exploration with intrinsic rewards estimated from novelty in
observations. However, there is a gap between the novelty of an observation and
an exploration in general, because the stochasticity in the environment as well
as the behavior of an agent may affect the observation. To estimate exploratory
behaviors accurately, we propose DEIR, a novel method where we theoretically
derive an intrinsic reward from a conditional mutual information term that
principally scales with the novelty contributed by agent explorations, and
materialize the reward with a discriminative forward model. We conduct
extensive experiments in both standard and hardened exploration games in
MiniGrid to show that DEIR quickly learns a better policy than baselines. Our
evaluations in ProcGen demonstrate both generalization capabilities and the
general applicability of our intrinsic reward.
( 2
min )
Recent years have seen a rich literature of data-driven approaches designed
for power grid applications. However, insufficient consideration of domain
knowledge can impose a high risk to the practicality of the methods.
Specifically, ignoring the grid-specific spatiotemporal patterns (in load,
generation, and topology, etc.) can lead to outputting infeasible,
unrealizable, or completely meaningless predictions on new inputs. To address
this concern, this paper investigates real-world operational data to provide
insights into power grid behavioral patterns, including the time-varying
topology, load, and generation, as well as the spatial differences (in peak
hours, diverse styles) between individual loads and generations. Then based on
these observations, we evaluate the generalization risks in some existing ML
works causedby ignoring these grid-specific patterns in model design and
training.
( 2
min )
It is difficult to identify anomalies in time series, especially when there
is a lot of noise. Denoising techniques can remove the noise but this technique
can cause a significant loss of information. To detect anomalies in the time
series we have proposed an attention free conditional autoencoder (AF-CA). We
started from the autoencoder conditional model on which we added an
Attention-Free LSTM layer \cite{inzirillo2022attention} in order to make the
anomaly detection capacity more reliable and to increase the power of anomaly
detection. We compared the results of our Attention Free Conditional
Autoencoder with those of an LSTM Autoencoder and clearly improved the
explanatory power of the model and therefore the detection of anomaly in noisy
time series.
( 2
min )
This article measures how sparsity can make neural networks more robust to
membership inference attacks. The obtained empirical results show that sparsity
improves the privacy of the network, while preserving comparable performances
on the task at hand. This empirical study completes and extends existing
literature.
( 2
min )
In Part I of the series “Creating Healthy AI Utility Function: Importance of Diversity,” I talked about the importance of embracing conflict and diversity to create a Healthy AI Utility Function; that is, creating an AI Utility Function that continuously balances conflicting KPIs and metrics to deliver responsible and ethical outcomes. The AI Utility Function… Read More »Creating Healthy AI Utility Function: ChatGPT Example – Part II
The post Creating Healthy AI Utility Function: ChatGPT Example – Part II appeared first on Data Science Central.
( 21
min )
submitted by /u/Joffylad
[link] [comments]
( 43
min )
Just in last 1 year, top 0.1% saw their wealth increase by 6 trillion dollars, bigger than wealth of most countries. https://www.cnbc.com/amp/2022/04/01/richest-one-percent-gained-trillions-in-wealth-2021.html
submitted by /u/timesarewasting
[link] [comments]
( 43
min )
For those of you interested in diving into the future of AI with some of the worlds leading AI experts, my company is hosting this free virtual event.
Kris Hammond (advises the U.N. and White House on AI) and his Northwestern students built us a custom AI/deepfake chat bot that will actually be on the panel answering questions and engaging in discussion…talk about Black Mirror situations. It should get interesting.
For those getting into AI or that understand how important it is for remaining competitive in your career, you should def check it out.
Here’s a link: https://chicagoinnovation.com/events/ai-vs-iq/
submitted by /u/chickenfettuccine
[link] [comments]
( 43
min )
submitted by /u/cgwuaqueduct
[link] [comments]
( 43
min )
submitted by /u/saintshing
[link] [comments]
( 43
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
With the advances of IoT developments, copious sensor data are communicated
through wireless networks and create the opportunity of building Digital Twins
to mirror and simulate the complex physical world. Digital Twin has long been
believed to rely heavily on domain knowledge, but we argue that this leads to a
high barrier of entry and slow development due to the scarcity and cost of
human experts. In this paper, we propose Digital Twin Graph (DTG), a general
data structure associated with a processing framework that constructs digital
twins in a fully automated and domain-agnostic manner. This work represents the
first effort that takes a completely data-driven and (unconventional) graph
learning approach to addresses key digital twin challenges.
( 2
min )
This study proposes a deep learning model for the classification and
segmentation of brain tumors from magnetic resonance imaging (MRI) scans. The
classification model is based on the EfficientNetB1 architecture and is trained
to classify images into four classes: meningioma, glioma, pituitary adenoma,
and no tumor. The segmentation model is based on the U-Net architecture and is
trained to accurately segment the tumor from the MRI images. The models are
evaluated on a publicly available dataset and achieve high accuracy and
segmentation metrics, indicating their potential for clinical use in the
diagnosis and treatment of brain tumors.
( 2
min )
Questions remain on the robustness of data-driven learning methods when
crossing the gap from simulation to reality. We utilize weight anchoring, a
method known from continual learning, to cultivate and fixate desired behavior
in Neural Networks. Weight anchoring may be used to find a solution to a
learning problem that is nearby the solution of another learning problem.
Thereby, learning can be carried out in optimal environments without neglecting
or unlearning desired behavior. We demonstrate this approach on the example of
learning mixed QoS-efficient discrete resource scheduling with infrequent
priority messages. Results show that this method provides performance
comparable to the state of the art of augmenting a simulation environment,
alongside significantly increased robustness and steerability.
( 2
min )
This work brings the leading accuracy, sample efficiency, and robustness of
deep equivariant neural networks to the extreme computational scale. This is
achieved through a combination of innovative model architecture, massive
parallelization, and models and implementations optimized for efficient GPU
utilization. The resulting Allegro architecture bridges the accuracy-speed
tradeoff of atomistic simulations and enables description of dynamics in
structures of unprecedented complexity at quantum fidelity. To illustrate the
scalability of Allegro, we perform nanoseconds-long stable simulations of
protein dynamics and scale up to a 44-million atom structure of a complete,
all-atom, explicitly solvated HIV capsid on the Perlmutter supercomputer. We
demonstrate excellent strong scaling up to 100 million atoms and 70% weak
scaling to 5120 A100 GPUs.
( 2
min )
The K Nearest Neighbors (KNN) classifier is widely used in many fields such
as fingerprint-based localization or medicine. It determines the class
membership of unlabelled sample based on the class memberships of the K
labelled samples, the so-called nearest neighbors, that are closest to the
unlabelled sample. The choice of K has been the topic of various studies and
proposed KNN-variants. Yet no variant has been proven to outperform all other
variants. In this paper a new KNN-variant is proposed which ensures that the K
nearest neighbors are indeed close to the unlabelled sample and finds K along
the way. The proposed algorithm is tested and compared to the standard KNN in
theoretical scenarios and for indoor localization based on ion-mobility
spectrometry fingerprints. It achieves a higher classification accuracy than
the KNN in the tests, while requiring having the same computational demand.
( 2
min )
Kernel-based modal statistical methods include mode estimation, regression,
and clustering. Estimation accuracy of these methods depends on the kernel used
as well as the bandwidth. We study effect of the selection of the kernel
function to the estimation accuracy of these methods. In particular, we
theoretically show a (multivariate) optimal kernel that minimizes its
analytically-obtained asymptotic error criterion when using an optimal
bandwidth, among a certain kernel class defined via the number of its sign
changes.
( 2
min )
Quantum computation has a strong implication for advancing the current
limitation of machine learning algorithms to deal with higher data dimensions
or reducing the overall training parameters for a deep neural network model.
Based on a gate-based quantum computer, a parameterized quantum circuit was
designed to solve a model-free reinforcement learning problem with the deep-Q
learning method. This research has investigated and evaluated its potential.
Therefore, a novel PQC based on the latest Qiskit and PyTorch framework was
designed and trained to compare with a full-classical deep neural network with
and without integrated PQC. At the end of the research, the research draws its
conclusion and prospects on developing deep quantum learning in solving a maze
problem or other reinforcement learning problems.
( 2
min )
This paper presents two novel deterministic initialization procedures for
K-means clustering based on a modified crowding distance. The procedures, named
CKmeans and FCKmeans, use more crowded points as initial centroids.
Experimental studies on multiple datasets demonstrate that the proposed
approach outperforms Kmeans and Kmeans++ in terms of clustering accuracy. The
effectiveness of CKmeans and FCKmeans is attributed to their ability to select
better initial centroids based on the modified crowding distance. Overall, the
proposed approach provides a promising alternative for improving K-means
clustering.
( 2
min )
We used survival analysis to quantify the impact of postdischarge evaluation
and management (E/M) services in preventing hospital readmission or death. Our
approach avoids a specific pitfall of applying machine learning to this
problem, which is an inflated estimate of the effect of interventions, due to
survivors bias -- where the magnitude of inflation may be conditional on
heterogeneous confounders in the population. This bias arises simply because in
order to receive an intervention after discharge, a person must not have been
readmitted in the intervening period. After deriving an expression for this
phantom effect, we controlled for this and other biases within an inherently
interpretable Bayesian survival framework. We identified case management
services as being the most impactful for reducing readmissions overall,
particularly for patients discharged to long term care facilities, with high
resource utilization in the quarter preceding admission.
( 2
min )
We study the impacts of business cycles on machine learning (ML) predictions.
Using the S&P 500 index, we find that ML models perform worse during most
recessions, and the inclusion of recession history or the risk-free rate does
not necessarily improve their performance. Investigating recessions where
models perform well, we find that they exhibit lower market volatility than
other recessions. This implies that the improved performance is not due to the
merit of ML methods but rather factors such as effective monetary policies that
stabilized the market. We recommend that ML practitioners evaluate their models
during both recessions and expansions.
( 2
min )
We propose a framework for descriptively analyzing sets of partial orders
based on the concept of depth functions. Despite intensive studies of depth
functions in linear and metric spaces, there is very little discussion on depth
functions for non-standard data types such as partial orders. We introduce an
adaptation of the well-known simplicial depth to the set of all partial orders,
the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a
comparison of machine learning algorithms based on multidimensional performance
measures. Concretely, we analyze the distribution of different classifier
performances over a sample of standard benchmark data sets. Our results
promisingly demonstrate that our approach differs substantially from existing
benchmarking approaches and, therefore, adds a new perspective to the vivid
debate on the comparison of classifiers.
( 2
min )
Support vector clustering is an important clustering method. However, it
suffers from a scalability issue due to its computational expensive cluster
assignment step. In this paper we accelertate the support vector clustering via
spectrum-preserving data compression. Specifically, we first compress the
original data set into a small amount of spectrally representative aggregated
data points. Then, we perform standard support vector clustering on the
compressed data set. Finally, we map the clustering results of the compressed
data set back to discover the clusters in the original data set. Our extensive
experimental results on real-world data set demonstrate dramatically speedups
over standard support vector clustering without sacrificing clustering quality.
( 2
min )
Support vector clustering is an important clustering method. However, it
suffers from a scalability issue due to its computational expensive cluster
assignment step. In this paper we accelertate the support vector clustering via
spectrum-preserving data compression. Specifically, we first compress the
original data set into a small amount of spectrally representative aggregated
data points. Then, we perform standard support vector clustering on the
compressed data set. Finally, we map the clustering results of the compressed
data set back to discover the clusters in the original data set. Our extensive
experimental results on real-world data set demonstrate dramatically speedups
over standard support vector clustering without sacrificing clustering quality.
( 2
min )
There isn’t a foolproof formula for building a successful digital firm — the risk of starting a business is high. There’s more to the frequently cited statistic that nine out of ten companies fail — a reason you should check out this step-by-step guide to starting a successful startup. The COVID-19 pandemic has put pressure… Read More »5 Crucial Steps To Starting A Successful Hi-Tech Startup: From Idea To Promotion
The post 5 Crucial Steps To Starting A Successful Hi-Tech Startup: From Idea To Promotion appeared first on Data Science Central.
( 21
min )
From climate modeling to endangered species conservation, developers, researchers and companies are keeping an AI on the environment with the help of NVIDIA technology. They’re using NVIDIA GPUs and software to track endangered African black rhinos, forecast the availability of solar energy in the U.K., build detailed climate models and monitor environmental disasters from satellite Read article >
( 7
min )
Content creators using Epic Games’ open, advanced real-time 3D creation tool, Unreal Engine, are now equipped with more features to bring their work to life with NVIDIA Omniverse, a platform for creating and operating metaverse applications. The Omniverse Connector for Unreal Engine’s 201.0 update brings significant enhancements to creative workflows using both open platforms. Streamlining Read article >
( 6
min )
What’s the difference between NVIDIA GeForce RTX 30 and 40 Series GPUs for gamers? To briefly set aside the technical specifications, the difference lies in the level of performance and capability each series offers. Both deliver great graphics. Both offer advanced new features driven by NVIDIA’s global AI revolution a decade ago. Either can power Read article >
( 6
min )
Batch inference is a common pattern where prediction requests are batched together on input, a job runs to process those requests against a trained model, and the output includes batch prediction responses that can then be consumed by other applications or business functions. Running batch use cases in production environments requires a repeatable process for […]
( 14
min )
The technology of MIT alumni-founded Hosta a.i. creates detailed property assessments from photos.
( 9
min )
submitted by /u/Ad3t0
[link] [comments]
( 42
min )
submitted by /u/DarkangelUK
[link] [comments]
( 43
min )
submitted by /u/Sparkvoltage
[link] [comments]
( 43
min )
Hello! Not sure if this is the right place to ask.
I am working on a startup, I was wondering what people think are some gaps in current machine learning infrastructure solutions like WandB, or Neptune.ai.
I'd love to know what people think are some missing features for products like these, or what completely new features they would like to see!
submitted by /u/spirited__tree
[link] [comments]
( 43
min )
Hi all,
Hope you are all well. Last time I posted about the fastLLaMa project on here, I had a lot of support from you guys and I really appreciated it. Motivated me to try random experiments and new things!
Thought I would give an update after a month.
Yesterday we added support to enable users to attach and detach LoRA adapters quickly during the runtime. This work was built on top of the original llama.cpp repo with some modifications that impact the adapter size (We are figuring out ways to reduce the adapter size through possible quantization).
We also built on top of our save load feature to enable quick context switching during run time! This should enable a single running instance to server multiple sessions.
We were also grateful for the feature requests from the last post a…
( 46
min )
More than 50 automotive companies around the world have deployed over 800 autonomous test vehicles powered by the NVIDIA DRIVE Hyperion automotive compute architecture, which has recently achieved new safety milestones. The latest NVIDIA DRIVE Hyperion architecture is based on the DRIVE Orin system-on-a-chip (SoC). Many NVIDIA DRIVE processes, as well as hardware and software Read article >
( 5
min )
GFN Thursday rolls up this week with a hot new deal for a GeForce NOW six-month Priority membership. Enjoy the cloud gaming service with seven new games to stream this week, including more favorites from Bandai Namco Europe and F1 2021 from Electronic Arts. Make Gaming a Priority Starting today, GeForce NOW is offering a Read article >
( 6
min )
NVIDIA today recognized a dozen partners for their work helping customers in Europe, the Middle East and Africa harness the power of AI across industries. At a virtual EMEA Partner Day event, which was hosted by the NVIDIA Partner Network (NPN) and drew more than 750 registrants, Partner of the Year awards were given to Read article >
( 6
min )
Each machine learning (ML) system has a unique service level agreement (SLA) requirement with respect to latency, throughput, and cost metrics. With advancements in hardware design, a wide range of CPU- and GPU-based infrastructures are available to help you speed up inference performance. Also, you can build these ML systems with a combination of ML […]
( 11
min )
These tunable proteins could be used to create new materials with specific mechanical properties, like toughness or flexibility.
( 10
min )
This study introduces and investigates the capabilities of three different
text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet
Analysis, and Clustering Word Vectors, for automating code extraction from a
relatively small discussion board dataset. We compare the outputs of each
algorithm with a previous dataset that was manually coded by two human raters.
The results show that even with a relatively small dataset, automated
approaches can be an asset to course instructors by extracting some of the
discussion codes, which can be used in Epistemic Network Analysis.
( 2
min )
Mining data streams is one of the main studies in machine learning area due
to its application in many knowledge areas. One of the major challenges on
mining data streams is concept drift, which requires the learner to discard the
current concept and adapt to a new one. Ensemble-based drift detection
algorithms have been used successfully to the classification task but usually
maintain a fixed size ensemble of learners running the risk of needlessly
spending processing time and memory. In this paper we present improvements to
the Scale-free Network Regressor (SFNR), a dynamic ensemble-based method for
regression that employs social networks theory. In order to detect concept
drifts SFNR uses the Adaptive Window (ADWIN) algorithm. Results show
improvements in accuracy, especially in concept drift situations and better
performance compared to other state-of-the-art algorithms in both real and
synthetic data.
( 2
min )
The quality of air is closely linked with the life quality of humans,
plantations, and wildlife. It needs to be monitored and preserved continuously.
Transportations, industries, construction sites, generators, fireworks, and
waste burning have a major percentage in degrading the air quality. These
sources are required to be used in a safe and controlled manner. Using
traditional laboratory analysis or installing bulk and expensive models every
few miles is no longer efficient. Smart devices are needed for collecting and
analyzing air data. The quality of air depends on various factors, including
location, traffic, and time. Recent researches are using machine learning
algorithms, big data technologies, and the Internet of Things to propose a
stable and efficient model for the stated purpose. This review paper focuses on
studying and compiling recent research in this field and emphasizes the Data
sources, Monitoring, and Forecasting models. The main objective of this paper
is to provide the astuteness of the researches happening to improve the various
aspects of air polluting models. Further, it casts light on the various
research issues and challenges also.
( 2
min )
Successful deployment of artificial intelligence (AI) in various settings has
led to numerous positive outcomes for individuals and society. However, AI
systems have also been shown to harm parts of the population due to biased
predictions. We take a closer look at AI fairness and analyse how lack of AI
fairness can lead to deepening of biases over time and act as a social
stressor. If the issues persist, it could have undesirable long-term
implications on society, reinforced by interactions with other risks. We
examine current strategies for improving AI fairness, assess their limitations
in terms of real-world deployment, and explore potential paths forward to
ensure we reap AI's benefits without harming significant parts of the society.
( 2
min )
Advances in mobile communication capabilities open the door for closer
integration of pre-hospital and in-hospital care processes. For example,
medical specialists can be enabled to guide on-site paramedics and can, in
turn, be supplied with live vitals or visuals. Consolidating such
performance-critical applications with the highly complex workings of mobile
communications requires solutions both reliable and efficient, yet easy to
integrate with existing systems. This paper explores the application of Deep
Deterministic Policy Gradient~(\ddpg) methods for learning a communications
resource scheduling algorithm with special regards to priority users. Unlike
the popular Deep-Q-Network methods, the \ddpg is able to produce
continuous-valued output. With light post-processing, the resulting scheduler
is able to achieve high performance on a flexible sum-utility goal.
( 2
min )
We study the training dynamics of shallow neural networks, in a two-timescale
regime in which the stepsizes for the inner layer are much smaller than those
for the outer layer. In this regime, we prove convergence of the gradient flow
to a global optimum of the non-convex optimization problem in a simple
univariate setting. The number of neurons need not be asymptotically large for
our result to hold, distinguishing our result from popular recent approaches
such as the neural tangent kernel or mean-field regimes. Experimental
illustration is provided, showing that the stochastic gradient descent behaves
according to our description of the gradient flow and thus converges to a
global optimum in the two-timescale regime, but can fail outside of this
regime.
( 2
min )
This paper introduces the QDQN-DPER framework to enhance the efficiency of
quantum reinforcement learning (QRL) in solving sequential decision tasks. The
framework incorporates prioritized experience replay and asynchronous training
into the training algorithm to reduce the high sampling complexities. Numerical
simulations demonstrate that QDQN-DPER outperforms the baseline distributed
quantum Q learning with the same model architecture. The proposed framework
holds potential for more complex tasks while maintaining training efficiency.
( 2
min )
We discuss the discontinuities that arise when mapping unordered objects to
neural network outputs of fixed permutation, referred to as the responsibility
problem. Prior work has proved the existence of the issue by identifying a
single discontinuity. Here, we show that discontinuities under such models are
uncountably infinite, motivating further research into neural networks for
unordered data.
( 2
min )
Prompt-based learning reformulates downstream tasks as cloze problems by
combining the original input with a template. This technique is particularly
useful in few-shot learning, where a model is trained on a limited amount of
data. However, the limited templates and text used in few-shot prompt-based
learning still leave significant room for performance improvement.
Additionally, existing methods using model ensembles can constrain the model
efficiency. To address these issues, we propose an augmentation method called
MixPro, which augments both the vanilla input text and the templates through
token-level, sentence-level, and epoch-level Mixup strategies. We conduct
experiments on five few-shot datasets, and the results show that MixPro
outperforms other augmentation baselines, improving model performance by an
average of 5.08% compared to before augmentation.
( 2
min )
Multicalibration is a notion of fairness that aims to provide accurate
predictions across a large set of groups. Multicalibration is known to be a
different goal than loss minimization, even for simple predictors such as
linear functions. In this note, we show that for (almost all) large neural
network sizes, optimally minimizing squared error leads to multicalibration.
Our results are about representational aspects of neural networks, and not
about algorithmic or sample complexity considerations. Previous such results
were known only for predictors that were nearly Bayes-optimal and were
therefore representation independent. We emphasize that our results do not
apply to specific algorithms for optimizing neural networks, such as SGD, and
they should not be interpreted as "fairness comes for free from optimizing
neural networks".
( 2
min )
Many machine learning methods assume that the training and test data follow
the same distribution. However, in the real world, this assumption is very
often violated. In particular, the phenomenon that the marginal distribution of
the data changes is called covariate shift, one of the most important research
topics in machine learning. We show that the well-known family of covariate
shift adaptation methods is unified in the framework of information geometry.
Furthermore, we show that parameter search for geometrically generalized
covariate shift adaptation method can be achieved efficiently. Numerical
experiments show that our generalization can achieve better performance than
the existing methods it encompasses.
( 2
min )
The recent advances in representation learning inspire us to take on the
challenging problem of unsupervised image classification tasks in a principled
way. We propose ContraCluster, an unsupervised image classification method that
combines clustering with the power of contrastive self-supervised learning.
ContraCluster consists of three stages: (1) contrastive self-supervised
pre-training (CPT), (2) contrastive prototype sampling (CPS), and (3)
prototype-based semi-supervised fine-tuning (PB-SFT). CPS can select highly
accurate, categorically prototypical images in an embedding space learned by
contrastive learning. We use sampled prototypes as noisy labeled data to
perform semi-supervised fine-tuning (PB-SFT), leveraging small prototypes and
large unlabeled data to further enhance the accuracy. We demonstrate
empirically that ContraCluster achieves new state-of-the-art results for
standard benchmark datasets including CIFAR-10, STL-10, and ImageNet-10. For
example, ContraCluster achieves about 90.8% accuracy for CIFAR-10, which
outperforms DAC (52.2%), IIC (61.7%), and SCAN (87.6%) by a large margin.
Without any labels, ContraCluster can achieve a 90.8% accuracy that is
comparable to 95.8% by the best supervised counterpart.
( 2
min )
Sea surface temperature (SST) is uniquely important to the Earth's atmosphere
since its dynamics are a major force in shaping local and global climate and
profoundly affect our ecosystems. Accurate forecasting of SST brings
significant economic and social implications, for example, better preparation
for extreme weather such as severe droughts or tropical cyclones months ahead.
However, such a task faces unique challenges due to the intrinsic complexity
and uncertainty of ocean systems. Recently, deep learning techniques, such as
graphical neural networks (GNN), have been applied to address this task. Even
though these methods have some success, they frequently have serious drawbacks
when it comes to investigating dynamic spatiotemporal dependencies between
signals. To solve this problem, this paper proposes a novel static and dynamic
learnable personalized graph convolution network (SD-LPGC). Specifically, two
graph learning layers are first constructed to respectively model the stable
long-term and short-term evolutionary patterns hidden in the multivariate SST
signals. Then, a learnable personalized convolution layer is designed to fuse
this information. Our experiments on real SST datasets demonstrate the
state-of-the-art performances of the proposed approach on the forecasting task.
( 2
min )
Federated Learning (FL) aims to train a machine learning (ML) model in a
distributed fashion to strengthen data privacy with limited data migration
costs. It is a distributed learning framework naturally suitable for
privacy-sensitive medical imaging datasets. However, most current FL-based
medical imaging works assume silos have ground truth labels for training. In
practice, label acquisition in the medical field is challenging as it often
requires extensive labor and time costs. To address this challenge and leverage
the unannotated data silos to improve modeling, we propose an alternate
training-based framework, Federated Alternate Training (FAT), that alters
training between annotated data silos and unannotated data silos. Annotated
data silos exploit annotations to learn a reasonable global segmentation model.
Meanwhile, unannotated data silos use the global segmentation model as a target
model to generate pseudo labels for self-supervised learning. We evaluate the
performance of the proposed framework on two naturally partitioned Federated
datasets, KiTS19 and FeTS2021, and show its promising performance.
( 2
min )
Parkinson's disease (PD) has been found to affect 1 out of every 1000 people,
being more inclined towards the population above 60 years. Leveraging
wearable-systems to find accurate biomarkers for diagnosis has become the need
of the hour, especially for a neurodegenerative condition like Parkinson's.
This work aims at focusing on early-occurring, common symptoms, such as motor
and gait related parameters to arrive at a quantitative analysis on the
feasibility of an economical and a robust wearable device. A subset of the
Parkinson's Progression Markers Initiative (PPMI), PPMI Gait dataset has been
utilised for feature-selection after a thorough analysis with various Machine
Learning algorithms. Identified influential features has then been used to test
real-time data for early detection of Parkinson Syndrome, with a model accuracy
of 91.9%
( 2
min )
We apply Bayesian optimization and reinforcement learning to a problem in
topology: the question of when a knot bounds a ribbon disk. This question is
relevant in an approach to disproving the four-dimensional smooth Poincar\'e
conjecture; using our programs, we rule out many potential counterexamples to
the conjecture. We also show that the programs are successful in detecting many
ribbon knots in the range of up to 70 crossings.
( 2
min )
Precise estimation of cross-correlation or similarity between two random
variables lies at the heart of signal detection, hyperdimensional computing,
associative memories, and neural networks. Although a vast literature exists on
different methods for estimating cross-correlations, the question what is the
best and simplest method to estimate cross-correlations using finite samples ?
is still not clear. In this paper, we first argue that the standard empirical
approach might not be the optimal method even though the estimator exhibits
uniform convergence to the true cross-correlation. Instead, we show that there
exists a large class of simple non-linear functions that can be used to
construct cross-correlators with a higher signal-to-noise ratio (SNR). To
demonstrate this, we first present a general mathematical framework using
Price's Theorem that allows us to analyze cross-correlators constructed using a
mixture of piece-wise linear functions. Using this framework and
high-dimensional embedding, we show that some of the most promising
cross-correlators are based on Huber's loss functions, margin-propagation (MP)
functions, and the log-sum-exp functions.
( 2
min )
We extend the global convergence result of Chatterjee
\cite{chatterjee2022convergence} by considering the stochastic gradient descent
(SGD) for non-convex objective functions. With minimal additional assumptions
that can be realized by finitely wide neural networks, we prove that if we
initialize inside a local region where the \L{}ajasiewicz condition holds, with
a positive probability, the stochastic gradient iterates converge to a global
minimum inside this region. A key component of our proof is to ensure that the
whole trajectories of SGD stay inside the local region with a positive
probability. For that, we assume the SGD noise scales with the objective
function, which is called machine learning noise and achievable in many real
examples. Furthermore, we provide a negative argument to show why using the
boundedness of noise with Robbins-Monro type step sizes is not enough to keep
the key component valid.
( 2
min )
Spatiotemporal (ST) data collected by sensors can be represented as
multi-variate time series, which is a sequence of data points listed in an
order of time. Despite the vast amount of useful information, the ST data
usually suffer from the issue of missing or incomplete data, which also limits
its applications. Imputation is one viable solution and is often used to
prepossess the data for further applications. However, in practice, n practice,
spatiotemporal data imputation is quite difficult due to the complexity of
spatiotemporal dependencies with dynamic changes in the traffic network and is
a crucial prepossessing task for further applications. Existing approaches
mostly only capture the temporal dependencies in time series or static spatial
dependencies. They fail to directly model the spatiotemporal dependencies, and
the representation ability of the models is relatively limited.
( 2
min )
Running complex sets of machine learning experiments is challenging and
time-consuming due to the lack of a unified framework. This leaves researchers
forced to spend time implementing necessary features such as parallelization,
caching, and checkpointing themselves instead of focussing on their project. To
simplify the process, in this paper, we introduce Memento, a Python package
that is designed to aid researchers and data scientists in the efficient
management and execution of computationally intensive experiments. Memento has
the capacity to streamline any experimental pipeline by providing a
straightforward configuration matrix and the ability to concurrently run
experiments across multiple threads. A demonstration of Memento is available
at: https://wickerlab.org/publication/memento.
( 2
min )
Multicalibration is a notion of fairness that aims to provide accurate
predictions across a large set of groups. Multicalibration is known to be a
different goal than loss minimization, even for simple predictors such as
linear functions. In this note, we show that for (almost all) large neural
network sizes, optimally minimizing squared error leads to multicalibration.
Our results are about representational aspects of neural networks, and not
about algorithmic or sample complexity considerations. Previous such results
were known only for predictors that were nearly Bayes-optimal and were
therefore representation independent. We emphasize that our results do not
apply to specific algorithms for optimizing neural networks, such as SGD, and
they should not be interpreted as "fairness comes for free from optimizing
neural networks".
( 2
min )
In this work we establish an algorithm and distribution independent
non-asymptotic trade-off between the model size, excess test loss, and training
loss of linear predictors. Specifically, we show that models that perform well
on the test data (have low excess loss) are either "classical" -- have training
loss close to the noise level, or are "modern" -- have a much larger number of
parameters compared to the minimum needed to fit the training data exactly.
We also provide a more precise asymptotic analysis when the limiting spectral
distribution of the whitened features is Marchenko-Pastur. Remarkably, while
the Marchenko-Pastur analysis is far more precise near the interpolation peak,
where the number of parameters is just enough to fit the training data, it
coincides exactly with the distribution independent bound as the level of
overparametrization increases.
( 2
min )
Precise estimation of cross-correlation or similarity between two random
variables lies at the heart of signal detection, hyperdimensional computing,
associative memories, and neural networks. Although a vast literature exists on
different methods for estimating cross-correlations, the question what is the
best and simplest method to estimate cross-correlations using finite samples ?
is still not clear. In this paper, we first argue that the standard empirical
approach might not be the optimal method even though the estimator exhibits
uniform convergence to the true cross-correlation. Instead, we show that there
exists a large class of simple non-linear functions that can be used to
construct cross-correlators with a higher signal-to-noise ratio (SNR). To
demonstrate this, we first present a general mathematical framework using
Price's Theorem that allows us to analyze cross-correlators constructed using a
mixture of piece-wise linear functions. Using this framework and
high-dimensional embedding, we show that some of the most promising
cross-correlators are based on Huber's loss functions, margin-propagation (MP)
functions, and the log-sum-exp functions.
( 2
min )
We study the training dynamics of shallow neural networks, in a two-timescale
regime in which the stepsizes for the inner layer are much smaller than those
for the outer layer. In this regime, we prove convergence of the gradient flow
to a global optimum of the non-convex optimization problem in a simple
univariate setting. The number of neurons need not be asymptotically large for
our result to hold, distinguishing our result from popular recent approaches
such as the neural tangent kernel or mean-field regimes. Experimental
illustration is provided, showing that the stochastic gradient descent behaves
according to our description of the gradient flow and thus converges to a
global optimum in the two-timescale regime, but can fail outside of this
regime.
( 2
min )
We extend the global convergence result of Chatterjee
\cite{chatterjee2022convergence} by considering the stochastic gradient descent
(SGD) for non-convex objective functions. With minimal additional assumptions
that can be realized by finitely wide neural networks, we prove that if we
initialize inside a local region where the \L{}ajasiewicz condition holds, with
a positive probability, the stochastic gradient iterates converge to a global
minimum inside this region. A key component of our proof is to ensure that the
whole trajectories of SGD stay inside the local region with a positive
probability. For that, we assume the SGD noise scales with the objective
function, which is called machine learning noise and achievable in many real
examples. Furthermore, we provide a negative argument to show why using the
boundedness of noise with Robbins-Monro type step sizes is not enough to keep
the key component valid.
( 2
min )
submitted by /u/LiveFromChabougamou
[link] [comments]
( 42
min )
Repo: https://github.com/h2oai/h2ogpt
From the repo:
- Open-source repository with fully permissive, commercially usable code, data and models
- Code for preparing large open-source datasets as instruction datasets for fine-tuning of large language models (LLMs), including prompt engineering
- Code for fine-tuning large language models (currently up to 20B parameters) on commodity hardware and enterprise GPU servers (single or multi node)
- Code to run a chatbot on a GPU server, with shareable end-point with Python client API
- Code to evaluate and compare the performance of fine-tuned LLMs
submitted by /u/luizluiz
[link] [comments]
( 43
min )
Code & Demo: https://github.com/z-x-yang/Segment-and-Track-Anything
https://reddit.com/link/12rne1j/video/kepu2xsg9tua1/player
WebUI App is also available
https://preview.redd.it/s8uub4ii9tua1.png?width=1371&format=png&auto=webp&s=0bc91232439543fe911679d0df5fb27565b56a77
submitted by /u/liulei-li
[link] [comments]
( 43
min )
The ability to effectively handle and process enormous amounts of documents has become essential for enterprises in the modern world. Due to the continuous influx of information that all enterprises deal with, manually classifying documents is no longer a viable option. Document classification models can automate the procedure and help organizations save time and resources. […]
( 10
min )
Businesses are increasingly using machine learning (ML) to make near-real-time decisions, such as placing an ad, assigning a driver, recommending a product, or even dynamically pricing products and services. ML models make predictions given a set of input data known as features, and data scientists easily spend more than 60% of their time designing and […]
( 15
min )
This is a guest post co-written with Fred Wu from Sportradar. Sportradar is the world’s leading sports technology company, at the intersection between sports, media, and betting. More than 1,700 sports federations, media outlets, betting operators, and consumer platforms across 120 countries rely on Sportradar knowhow and technology to boost their business. Sportradar uses data […]
( 10
min )
MIT researchers exhibit a new advancement in autonomous drone navigation, using brain-inspired liquid neural networks that excel in out-of-distribution scenarios.
( 9
min )
Shanghai is once again showing why it’s called the “Magic City” as more than 1,000 exhibitors from 20 countries dazzle the automotive world this week at the highly anticipated International Automobile Industry Exhibition. With nearly 1,500 vehicles on display, the 20th edition of Auto Shanghai is showcasing the newest AI-powered cars and mobility solutions using Read article >
( 8
min )
This week’s In the NVIDIA Studio artists specializing in 3D, Gianluca Squillace and Pasquale Scionti, benefitted from just that — in their individual work and in collaborating to construct the final scene for their project, Cold Inside Diorama.
( 7
min )
For many people, opening door handles or moving a pen between their fingers is a movement that happens multiple times a day, often without much thought. For a robot, however, these movements aren’t always so easy. In reinforcement learning, robots learn to perform tasks by exploring their environments, receiving signals along the way that indicate […]
The post Unifying learning from preferences and demonstration via a ranking game for imitation learning appeared first on Microsoft Research.
( 15
min )
I develop a simplest traffic simulator of have five cars, I want improve the ability of cars's dirve using basic reinforcement learning skill.
I used tkinter to render and display the maps, But I found that tkinter can't support maps that have more than 20 row and columns in my person machine(Mac M1 mini), I don't know how to display bigger maps that have more rows and columns.
I'm very grateful that if you have some suggestion.
github repositories: https://github.com/wa008/reinforcement-learning
submitted by /u/waa007
[link] [comments]
( 42
min )
In this paper, we introduce four main novelties: First, we present a new way
of handling the topology problem of normalizing flows. Second, we describe a
technique to enforce certain classes of boundary conditions onto normalizing
flows. Third, we introduce the I-Spline bijection, which, similar to previous
work, leverages splines but, in contrast to those works, can be made
arbitrarily often differentiable. And finally, we use these techniques to
create Waveflow, an Ansatz for the one-space-dimensional multi-particle
fermionic wave functions in real space based on normalizing flows, that can be
efficiently trained with Variational Quantum Monte Carlo without the need for
MCMC nor estimation of a normalization constant. To enforce the necessary
anti-symmetry of fermionic wave functions, we train the normalizing flow only
on the fundamental domain of the permutation group, which effectively reduces
it to a boundary value problem.
( 2
min )
The article reviews significant advances in networked signal and information
processing, which have enabled in the last 25 years extending decision making
and inference, optimization, control, and learning to the increasingly
ubiquitous environments of distributed agents. As these interacting agents
cooperate, new collective behaviors emerge from local decisions and actions.
Moreover, and significantly, theory and applications show that networked
agents, through cooperation and sharing, are able to match the performance of
cloud or federated solutions, while offering the potential for improved
privacy, increasing resilience, and saving resources.
( 2
min )
This paper proposes a novel centralized training and distributed execution
(CTDE)-based multi-agent deep reinforcement learning (MADRL) method for
multiple unmanned aerial vehicles (UAVs) control in autonomous mobile access
applications. For the purpose, a single neural network is utilized in
centralized training for cooperation among multiple agents while maximizing the
total quality of service (QoS) in mobile access applications.
( 2
min )
Consumer's privacy is a main concern in Smart Grids (SGs) due to the
sensitivity of energy data, particularly when used to train machine learning
models for different services. These data-driven models often require huge
amounts of data to achieve acceptable performance leading in most cases to
risks of privacy leakage. By pushing the training to the edge, Federated
Learning (FL) offers a good compromise between privacy preservation and the
predictive performance of these models. The current paper presents an overview
of FL applications in SGs while discussing their advantages and drawbacks,
mainly in load forecasting, electric vehicles, fault diagnoses, load
disaggregation and renewable energies. In addition, an analysis of main design
trends and possible taxonomies is provided considering data partitioning, the
communication topology, and security mechanisms. Towards the end, an overview
of main challenges facing this technology and potential future directions is
presented.
( 2
min )
This paper presents the approach and results of USC SAIL's submission to the
Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting
relapses in psychotic patients. Relapse prediction has proven to be
challenging, primarily due to the heterogeneity of symptoms and responses to
treatment between individuals. We address these challenges by investigating the
use of sleep behavior features to estimate relapse days as outliers in an
unsupervised machine learning setting. We extract informative features from
human activity and heart rate data collected in the wild, and evaluate various
combinations of feature types and time resolutions. We found that short-time
sleep behavior features outperformed their awake counterparts and larger time
intervals. Our submission was ranked 3rd in the Task's official leaderboard,
demonstrating the potential of such features as an objective and non-invasive
predictor of psychotic relapses.
( 2
min )
Fetal standard scan plane detection during 2-D mid-pregnancy examinations is
a highly complex task, which requires extensive medical knowledge and years of
training. Although deep neural networks (DNN) can assist inexperienced
operators in these tasks, their lack of transparency and interpretability limit
their application. Despite some researchers have been committed to visualizing
the decision process of DNN, most of them only focus on the pixel-level
features and do not take into account the medical prior knowledge. In this
work, we propose an interpretable framework based on key medical concepts,
which provides explanations from the perspective of clinicians' cognition.
Moreover, we utilize a concept-based graph convolutional neural(GCN) network to
construct the relationships between key medical concepts. Extensive
experimental analysis on a private dataset has shown that the proposed method
provides easy-to-understand insights about reasoning results for clinicians.
( 2
min )
Self-supervised monocular depth estimation approaches suffer not only from
scale ambiguity but also infer temporally inconsistent depth maps w.r.t. scale.
While disambiguating scale during training is not possible without some kind of
ground truth supervision, having scale consistent depth predictions would make
it possible to calculate scale once during inference as a post-processing step
and use it over-time. With this as a goal, a set of temporal consistency losses
that minimize pose inconsistencies over time are introduced. Evaluations show
that introducing these constraints not only reduces depth inconsistencies but
also improves the baseline performance of depth and ego-motion prediction.
( 2
min )
In this paper, we primarily focus on understanding the data preprocessing
pipeline for DNN Training in the public cloud. First, we run experiments to
test the performance implications of the two major data preprocessing methods
using either raw data or record files. The preliminary results show that data
preprocessing is a clear bottleneck, even with the most efficient software and
hardware configuration enabled by NVIDIA DALI, a high-optimized data
preprocessing library. Second, we identify the potential causes, exercise a
variety of optimization methods, and present their pros and cons. We hope this
work will shed light on the new co-design of ``data storage, loading pipeline''
and ``training framework'' and flexible resource configurations between them so
that the resources can be fully exploited and performance can be maximized.
( 2
min )
In this paper, we extends original Neural Collapse Phenomenon by proving
Generalized Neural Collapse hypothesis. We obtain Grassmannian Frame structure
from the optimization and generalization of classification. This structure
maximally separates features of every two classes on a sphere and does not
require a larger feature dimension than the number of classes. Out of curiosity
about the symmetry of Grassmannian Frame, we conduct experiments to explore if
models with different Grassmannian Frames have different performance. As a
result, we discover the Symmetric Generalization phenomenon. We provide a
theorem to explain Symmetric Generalization of permutation. However, the
question of why different directions of features can lead to such different
generalization is still open for future investigation.
( 2
min )
Robotic grasping in highly noisy environments presents complex challenges,
especially with limited prior knowledge about the scene. In particular,
identifying good grasping poses with Bayesian inference becomes difficult due
to two reasons: i) generating data from uninformative priors proves to be
inefficient, and ii) the posterior often entails a complex distribution defined
on a Riemannian manifold. In this study, we explore the use of implicit
representations to construct scene-dependent priors, thereby enabling the
application of efficient simulation-based Bayesian inference algorithms for
determining successful grasp poses in unstructured environments. Results from
both simulation and physical benchmarks showcase the high success rate and
promising potential of this approach.
( 2
min )
In this paper, we describe a method for estimating the joint probability
density from data samples by assuming that the underlying distribution can be
decomposed as a mixture of product densities with few mixture components. Prior
works have used such a decomposition to estimate the joint density from
lower-dimensional marginals, which can be estimated more reliably with the same
number of samples. We combine two key ideas: dictionaries to represent 1-D
densities, and random projections to estimate the joint distribution from 1-D
marginals, explored separately in prior work. Our algorithm benefits from
improved sample complexity over the previous dictionary-based approach by using
1-D marginals for reconstruction. We evaluate the performance of our method on
estimating synthetic probability densities and compare it with the previous
dictionary-based approach and Gaussian Mixture Models (GMMs). Our algorithm
outperforms these other approaches in all the experimental settings.
( 2
min )
This study presents a benchmark for evaluating action-constrained
reinforcement learning (RL) algorithms. In action-constrained RL, each action
taken by the learning system must comply with certain constraints. These
constraints are crucial for ensuring the feasibility and safety of actions in
real-world systems. We evaluate existing algorithms and their novel variants
across multiple robotics control environments, encompassing multiple action
constraint types. Our evaluation provides the first in-depth perspective of the
field, revealing surprising insights, including the effectiveness of a
straightforward baseline approach. The benchmark problems and associated code
utilized in our experiments are made available online at
github.com/omron-sinicx/action-constrained-RL-benchmark for further research
and development.
( 2
min )
Trained computer vision models are assumed to solve vision tasks by imitating
human behavior learned from training labels. Most efforts in recent vision
research focus on measuring the model task performance using standardized
benchmarks. Limited work has been done to understand the perceptual difference
between humans and machines. To fill this gap, our study first quantifies and
analyzes the statistical distributions of mistakes from the two sources. We
then explore human vs. machine expertise after ranking tasks by difficulty
levels. Even when humans and machines have similar overall accuracies, the
distribution of answers may vary. Leveraging the perceptual difference between
humans and machines, we empirically demonstrate a post-hoc human-machine
collaboration that outperforms humans or machines alone.
( 2
min )
We present LTC-SE, an improved version of the Liquid Time-Constant (LTC)
neural network algorithm originally proposed by Hasani et al. in 2021. This
algorithm unifies the Leaky-Integrate-and-Fire (LIF) spiking neural network
model with Continuous-Time Recurrent Neural Networks (CTRNNs), Neural Ordinary
Differential Equations (NODEs), and bespoke Gated Recurrent Units (GRUs). The
enhancements in LTC-SE focus on augmenting flexibility, compatibility, and code
organization, targeting the unique constraints of embedded systems with limited
computational resources and strict performance requirements. The updated code
serves as a consolidated class library compatible with TensorFlow 2.x, offering
comprehensive configuration options for LTCCell, CTRNN, NODE, and CTGRU
classes. We evaluate LTC-SE against its predecessors, showcasing the advantages
of our optimizations in user experience, Keras function compatibility, and code
clarity. These refinements expand the applicability of liquid neural networks
in diverse machine learning tasks, such as robotics, causality analysis, and
time-series prediction, and build on the foundational work of Hasani et al.
( 2
min )
Modern deep models for summarization attains impressive benchmark
performance, but they are prone to generating miscalibrated predictive
uncertainty. This means that they assign high confidence to low-quality
predictions, leading to compromised reliability and trustworthiness in
real-world applications. Probabilistic deep learning methods are common
solutions to the miscalibration problem. However, their relative effectiveness
in complex autoregressive summarization tasks are not well-understood. In this
work, we thoroughly investigate different state-of-the-art probabilistic
methods' effectiveness in improving the uncertainty quality of the neural
summarization models, across three large-scale benchmarks with varying
difficulty. We show that the probabilistic methods consistently improve the
model's generation and uncertainty quality, leading to improved selective
generation performance (i.e., abstaining from low-quality summaries) in
practice. We also reveal notable failure patterns of probabilistic methods
widely-adopted in NLP community (e.g., Deep Ensemble and Monte Carlo Dropout),
cautioning the importance of choosing appropriate method for the data setting.
( 2
min )
In this paper we study a class of constrained minimax problems. In
particular, we propose a first-order augmented Lagrangian method for solving
them, whose subproblems turn out to be a much simpler structured minimax
problem and are suitably solved by a first-order method recently developed in
[26] by the authors. Under some suitable assumptions, an \emph{operation
complexity} of ${\cal O}(\varepsilon^{-4}\log\varepsilon^{-1})$, measured by
its fundamental operations, is established for the first-order augmented
Lagrangian method for finding an $\varepsilon$-KKT solution of the constrained
minimax problems.
( 2
min )
The Linear-Quadratic Regulation (LQR) problem with unknown system parameters
has been widely studied, but it has remained unclear whether $\tilde{
\mathcal{O}}(\sqrt{T})$ regret, which is the best known dependence on time, can
be achieved almost surely. In this paper, we propose an adaptive LQR controller
with almost surely $\tilde{ \mathcal{O}}(\sqrt{T})$ regret upper bound. The
controller features a circuit-breaking mechanism, which circumvents potential
safety breach and guarantees the convergence of the system parameter estimate,
but is shown to be triggered only finitely often and hence has negligible
effect on the asymptotic performance of the controller. The proposed controller
is also validated via simulation on Tennessee Eastman Process~(TEP), a commonly
used industrial process example.
( 2
min )
In this paper, a critical bibliometric analysis study is conducted, coupled
with an extensive literature survey on recent developments and associated
applications in machine learning research with a perspective on Africa. The
presented bibliometric analysis study consists of 2761 machine learning-related
documents, of which 98% were articles with at least 482 citations published in
903 journals during the past 30 years. Furthermore, the collated documents were
retrieved from the Science Citation Index EXPANDED, comprising research
publications from 54 African countries between 1993 and 2021. The bibliometric
study shows the visualization of the current landscape and future trends in
machine learning research and its application to facilitate future
collaborative research and knowledge exchange among authors from different
research institutions scattered across the African continent.
( 2
min )
Chen et al. [Chen2022] recently published the article 'Fast and scalable
search of whole-slide images via self-supervised deep learning' in Nature
Biomedical Engineering. The authors call their method 'self-supervised image
search for histology', short SISH. We express our concerns that SISH is an
incremental modification of Yottixel, has used MinMax binarization but does not
cite the original works, and is based on a misnomer 'self-supervised image
search'. As well, we point to several other concerns regarding experiments and
comparisons performed by Chen et al.
( 2
min )
Adaptation-relevant predictions of climate change are often derived by
combining climate model simulations in a multi-model ensemble. Model evaluation
methods used in performance-based ensemble weighting schemes have limitations
in the context of high-impact extreme events. We introduce a locally
time-invariant method for evaluating climate model simulations with a focus on
assessing the simulation of extremes. We explore the behaviour of the proposed
method in predicting extreme heat days in Nairobi and provide comparative
results for eight additional cities.
( 2
min )
Enabling resilient autonomous motion planning requires robust predictions of
surrounding road users' future behavior. In response to this need and the
associated challenges, we introduce our model titled MTP-GO. The model encodes
the scene using temporal graph neural networks to produce the inputs to an
underlying motion model. The motion model is implemented using neural ordinary
differential equations where the state-transition functions are learned with
the rest of the model. Multimodal probabilistic predictions are obtained by
combining the concept of mixture density networks and Kalman filtering. The
results illustrate the predictive capabilities of the proposed model across
various data sets, outperforming several state-of-the-art methods on a number
of metrics.
( 2
min )
Nowadays, face recognition systems surpass human performance on several
datasets. However, there are still edge cases that the machine can't correctly
classify. This paper investigates the effect of a combination of machine and
human operators in the face verification task. First, we look closer at the
edge cases for several state-of-the-art models to discover common datasets'
challenging settings. Then, we conduct a study with 60 participants on these
selected tasks with humans and provide an extensive analysis. Finally, we
demonstrate that combining machine and human decisions can further improve the
performance of state-of-the-art face verification systems on various benchmark
datasets. Code and data are publicly available on GitHub.
( 2
min )
Stochastic gradient Langevin dynamics (SGLD) are a useful methodology for
sampling from probability distributions. This paper provides a finite sample
analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD)
designed to achieve inverse reinforcement learning. By "passive", we mean that
the noisy gradients available to the PSGLD algorithm (inverse learning process)
are evaluated at randomly chosen points by an external stochastic gradient
algorithm (forward learner). The PSGLD algorithm thus acts as a randomized
sampler which recovers the cost function being optimized by this external
process. Previous work has analyzed the asymptotic performance of this passive
algorithm using stochastic approximation techniques; in this work we analyze
the non-asymptotic performance. Specifically, we provide finite-time bounds on
the 2-Wasserstein distance between the passive algorithm and its stationary
measure, from which the reconstructed cost function is obtained.
( 2
min )
In this paper we study a class of constrained minimax problems. In
particular, we propose a first-order augmented Lagrangian method for solving
them, whose subproblems turn out to be a much simpler structured minimax
problem and are suitably solved by a first-order method recently developed in
[26] by the authors. Under some suitable assumptions, an \emph{operation
complexity} of ${\cal O}(\varepsilon^{-4}\log\varepsilon^{-1})$, measured by
its fundamental operations, is established for the first-order augmented
Lagrangian method for finding an $\varepsilon$-KKT solution of the constrained
minimax problems.
( 2
min )
There is an increasing interest in the development of new data-driven models
useful to assess the performance of communication networks. For many
applications, like network monitoring and troubleshooting, a data model is of
little use if it cannot be interpreted by a human operator. In this paper, we
present an extension of the Multivariate Big Data Analysis (MBDA) methodology,
a recently proposed interpretable data analysis tool. In this extension, we
propose a solution to the automatic derivation of features, a cornerstone step
for the application of MBDA when the amount of data is massive. The resulting
network monitoring approach allows us to detect and diagnose disparate network
anomalies, with a data-analysis workflow that combines the advantages of
interpretable and interactive models with the power of parallel processing. We
apply the extended MBDA to two case studies: UGR'16, a benchmark flow-based
real-traffic dataset for anomaly detection, and Dartmouth'18, the longest and
largest Wi-Fi trace known to date.
( 2
min )
Data - https://github.com/allenai/mmc4
submitted by /u/MysteryInc152
[link] [comments]
( 43
min )
submitted by /u/Phaen_
[link] [comments]
( 45
min )
submitted by /u/Express_Turn_5489
[link] [comments]
( 56
min )
Data warehouses are at the heart of any organization’s technology ecosystem. The emergence of cloud technology has enabled data warehouses to offer capabilities such as cost-effective data storage, scalable computing and storage, utilization-based pricing, and fully managed service delivery. As data consumption increases and more people live and work remotely, companies are adopting modern data… Read More »Why It’s Important to Change Misconceptions About Data Warehouse Technology
The post Why It’s Important to Change Misconceptions About Data Warehouse Technology appeared first on Data Science Central.
( 21
min )
Three years after the outbreak of the COVID-19 pandemic, the lingering impacts of the viral outbreak and the risk of another deadly pathogen spreading around the world remain. The pandemic challenged every health system in the world, stressing facilities, medical equipment suppliers, and medical personnel. Public health authorities tracked disease transmission, modeled forecasts across multiple… Read More »How Informatics, ML, and AI Can Better Prepare the Healthcare Industry for the Next Global Pandemic
The post How Informatics, ML, and AI Can Better Prepare the Healthcare Industry for the Next Global Pandemic appeared first on Data Science Central.
( 21
min )
Artificial Intelligence (AI) is sweeping the globe, leaving no stone unturned as it reshapes industries far and wide.
The post Harnessing the Power of OpenAI Technology: 5 Innovative Marketing Tools appeared first on Data Science Central.
( 20
min )
Large language models (LLMs) with billions of parameters are currently at the forefront of natural language processing (NLP). These models are shaking up the field with their incredible abilities to generate text, analyze sentiment, translate languages, and much more. With access to massive amounts of data, LLMs have the potential to revolutionize the way we […]
( 18
min )
Large language models (LLMs) with billions of parameters are currently at the forefront of natural language processing (NLP). These models are shaking up the field with their incredible abilities to generate text, analyze sentiment, translate languages, and much more. With access to massive amounts of data, LLMs have the potential to revolutionize the way we […]
( 18
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML), enabling organizations to provide relevant information to customers and employees, when they need it. Amazon Kendra uses ML algorithms to enable users to use natural language queries to search for information scattered across multiple data souces in an enterprise, including commonly used document […]
( 7
min )
This post was co-written with Dave Gowel, CEO of RallyPoint. In his own words, “RallyPoint is an online social and professional network for veterans, service members, family members, caregivers, and other civilian supporters of the US armed forces. With two million members on the platform, the company provides a comfortable place for this deserving population […]
( 9
min )
Reliability managers and technicians in industrial environments such as manufacturing production lines, warehouses, and industrial plants are keen to improve equipment health and uptime to maximize product output and quality. Machine and process failures are often addressed by reactive activity after incidents happen or by costly preventive maintenance, where you run the risk of over-maintaining […]
( 16
min )
In the first two blog posts in this series, we presented our vision for Cloud Intelligence/AIOps (AIOps) research, and scenarios where innovations in AI technologies can help build and operate complex cloud platforms and services effectively and efficiently at scale. In this blog post, we dive deeper into our efforts to automatically manage large-scale cloud […]
The post Automatic post-deployment management of cloud applications appeared first on Microsoft Research.
( 15
min )
Sparked by the release of large AI models like AlexaTM, GPT, OpenChatKit, BLOOM, GPT-J, GPT-NeoX, FLAN-T5, OPT, Stable Diffusion, and ControlNet, the popularity of generative AI has seen a recent boom. Businesses are beginning to evaluate new cutting-edge applications of the technology in text, image, audio, and video generation that have the potential to revolutionize […]
( 18
min )
“Instead of focusing on the code, companies should focus on developing systematic engineering practices for improving data in ways that are reliable, efficient, and systematic. In other words, companies need to move from a model-centric approach to a data-centric approach.” – Andrew Ng A data-centric AI approach involves building AI systems with quality data involving […]
( 10
min )
As more businesses increase their online presence to serve their customers better, new fraud patterns are constantly emerging. In today’s ever-evolving digital landscape, where fraudsters are becoming more sophisticated in their tactics, detecting and preventing such fraudulent activities has become paramount for companies and financial institutions. Traditional rule-based fraud detection systems are capped in their […]
( 9
min )
RStudio on Amazon SageMaker is the industry’s first fully managed RStudio Workbench integrated development environment (IDE) in the cloud. You can quickly launch the familiar RStudio IDE and dial up and down the underlying compute resources without interrupting your work, making it easy to build machine learning (ML) and analytics solutions in R at scale. […]
( 7
min )
The dask release 2023.2.1 , introduced a new shuffling method called P2P for dask.dataframe, making sorts, merges, and joins faster and using constant memory. This article describes the problem, the new solution, and the impact on performance.
https://medium.com/coiled-hq/shuffling-large-data-at-constant-memory-in-dask-bb683e92d70b
submitted by /u/dask-jeeves
[link] [comments]
( 43
min )
submitted by /u/bartturner
[link] [comments]
( 42
min )
At the Hannover Messe trade show this week, Siemens unveiled a digital model of next-generation FREYR Battery factories that was developed using NVIDIA technology. The model was created in part to highlight a strategic partnership announced Monday by Siemens and FREYR, with Siemens becoming FREYR’s preferred supplier in automation technology, enabling the Norway-based group to Read article >
( 5
min )
Microsoft has made significant contributions to the prestigious USENIX NSDI’23 conference, which brings together experts in computer networks and distributed systems. A silver sponsor for the conference, Microsoft is a leader in developing innovative technologies for networking, and we are proud to have contributed to 30 papers accepted this year. Our team members also served […]
The post Microsoft at NSDI 2023: A commitment to advancing networking and distributed systems appeared first on Microsoft Research.
( 13
min )
This work addresses large dimensional covariance matrix estimation with
unknown mean. The empirical covariance estimator fails when dimension and
number of samples are proportional and tend to infinity, settings known as
Kolmogorov asymptotics. When the mean is known, Ledoit and Wolf (2004) proposed
a linear shrinkage estimator and proved its convergence under those
asymptotics. To the best of our knowledge, no formal proof has been proposed
when the mean is unknown. To address this issue, we propose a new estimator and
prove its quadratic convergence under the Ledoit and Wolf assumptions. Finally,
we show empirically that it outperforms other standard estimators.
( 2
min )
We present a novel approach for black-box VI that bypasses the difficulties
of stochastic gradient ascent, including the task of selecting step-sizes. Our
approach involves using a sequence of sample average approximation (SAA)
problems. SAA approximates the solution of stochastic optimization problems by
transforming them into deterministic ones. We use quasi-Newton methods and line
search to solve each deterministic optimization problem and present a heuristic
policy to automate hyperparameter selection. Our experiments show that our
method simplifies the VI problem and achieves faster performance than existing
methods.
( 2
min )
In data-driven stochastic optimization, model parameters of the underlying
distribution need to be estimated from data in addition to the optimization
task. Recent literature suggests the integration of the estimation and
optimization processes, by selecting model parameters that lead to the best
empirical objective performance. Such an integrated approach can be readily
shown to outperform simple ``estimate then optimize" when the model is
misspecified. In this paper, we argue that when the model class is rich enough
to cover the ground truth, the performance ordering between the two approaches
is reversed for nonlinear problems in a strong sense. Simple ``estimate then
optimize" outperforms the integrated approach in terms of stochastic dominance
of the asymptotic optimality gap, i,e, the mean, all other moments, and the
entire asymptotic distribution of the optimality gap is always better.
Analogous results also hold under constrained settings and when contextual
features are available. We also provide experimental findings to support our
theory.
( 2
min )
PAC-Bayes learning is an established framework to assess the generalisation
ability of learning algorithm during the training phase. However, it remains
challenging to know whether PAC-Bayes is useful to understand, before training,
why the output of well-known algorithms generalise well. We positively answer
this question by expanding the \emph{Wasserstein PAC-Bayes} framework, briefly
introduced in \cite{amit2022ipm}. We provide new generalisation bounds
exploiting geometric assumptions on the loss function. Using our framework, we
prove, before any training, that the output of an algorithm from
\citet{lambert2022variational} has a strong asymptotic generalisation ability.
More precisely, we show that it is possible to incorporate optimisation results
within a generalisation framework, building a bridge between PAC-Bayes and
optimisation algorithms.
( 2
min )
Ultrasound is the primary modality to examine fetal growth during pregnancy,
while the image quality could be affected by various factors. Quality
assessment is essential for controlling the quality of ultrasound images to
guarantee both the perceptual and diagnostic values. Existing automated
approaches often require heavy structural annotations and the predictions may
not necessarily be consistent with the assessment results by human experts.
Furthermore, the overall quality of a scan and the correlation between the
quality of frames should not be overlooked. In this work, we propose a
reinforcement learning framework powered by two hierarchical agents that
collaboratively learn to perform both frame-level and video-level quality
assessments. It is equipped with a specially-designed reward mechanism that
considers temporal dependency among frame quality and only requires sparse
binary annotations to train. Experimental results on a challenging fetal brain
dataset verify that the proposed framework could perform dual-level quality
assessment and its predictions correlate well with the subjective assessment
results.
( 2
min )
This paper considers the problem of testing the maximum in-degree of the
Bayes net underlying an unknown probability distribution $P$ over $\{0,1\}^n$,
given sample access to $P$. We show that the sample complexity of the problem
is $\tilde{\Theta}(2^{n/2}/\varepsilon^2)$. Our algorithm relies on a
testing-by-learning framework, previously used to obtain sample-optimal
testers; in order to apply this framework, we develop new algorithms for
``near-proper'' learning of Bayes nets, and high-probability learning under
$\chi^2$ divergence, which are of independent interest.
( 2
min )
We present the first $\varepsilon$-differentially private, computationally
efficient algorithm that estimates the means of product distributions over
$\{0,1\}^d$ accurately in total-variation distance, whilst attaining the
optimal sample complexity to within polylogarithmic factors. The prior work had
either solved this problem efficiently and optimally under weaker notions of
privacy, or had solved it optimally while having exponential running times.
( 2
min )
Machine learning algorithms, both in their classical and quantum versions,
heavily rely on optimization algorithms based on gradients, such as gradient
descent and alike. The overall performance is dependent on the appearance of
local minima and barren plateaus, which slow-down calculations and lead to
non-optimal solutions. In practice, this results in dramatic computational and
energy costs for AI applications. In this paper we introduce a generic strategy
to accelerate and improve the overall performance of such methods, allowing to
alleviate the effect of barren plateaus and local minima. Our method is based
on coordinate transformations, somehow similar to variational rotations, adding
extra directions in parameter space that depend on the cost function itself,
and which allow to explore the configuration landscape more efficiently. The
validity of our method is benchmarked by boosting a number of quantum machine
learning algorithms, getting a very significant improvement in their
performance.
( 2
min )
Edge computing solutions that enable the extraction of high level information
from a variety of sensors is in increasingly high demand. This is due to the
increasing number of smart devices that require sensory processing for their
application on the edge. To tackle this problem, we present a smart vision
sensor System on Chip (Soc), featuring an event-based camera and a low power
asynchronous spiking Convolutional Neuronal Network (sCNN) computing
architecture embedded on a single chip. By combining both sensor and processing
on a single die, we can lower unit production costs significantly. Moreover,
the simple end-to-end nature of the SoC facilitates small stand-alone
applications as well as functioning as an edge node in a larger systems. The
event-driven nature of the vision sensor delivers high-speed signals in a
sparse data stream. This is reflected in the processing pipeline, focuses on
optimising highly sparse computation and minimising latency for 9 sCNN layers
to $3.36\mu s$. Overall, this results in an extremely low-latency visual
processing pipeline deployed on a small form factor with a low energy budget
and sensor cost. We present the asynchronous architecture, the individual
blocks, the sCNN processing principle and benchmark against other sCNN capable
processors.
( 3
min )
With the increasing penetration of renewable power sources such as wind and
solar, accurate short-term, nowcasting renewable power prediction is becoming
increasingly important. This paper investigates the multi-modal (MM) learning
and end-to-end (E2E) learning for nowcasting renewable power as an intermediate
to energy management systems. MM combines features from all-sky imagery and
meteorological sensor data as two modalities to predict renewable power
generation that otherwise could not be combined effectively. The combined,
predicted values are then input to a differentiable optimal power flow (OPF)
formulation simulating the energy management. For the first time, MM is
combined with E2E training of the model that minimises the expected total
system cost. The case study tests the proposed methodology on the real sky and
meteorological data from the Netherlands. In our study, the proposed MM-E2E
model reduced system cost by 30% compared to uni-modal baselines.
( 2
min )
In data-driven stochastic optimization, model parameters of the underlying
distribution need to be estimated from data in addition to the optimization
task. Recent literature suggests the integration of the estimation and
optimization processes, by selecting model parameters that lead to the best
empirical objective performance. Such an integrated approach can be readily
shown to outperform simple ``estimate then optimize" when the model is
misspecified. In this paper, we argue that when the model class is rich enough
to cover the ground truth, the performance ordering between the two approaches
is reversed for nonlinear problems in a strong sense. Simple ``estimate then
optimize" outperforms the integrated approach in terms of stochastic dominance
of the asymptotic optimality gap, i,e, the mean, all other moments, and the
entire asymptotic distribution of the optimality gap is always better.
Analogous results also hold under constrained settings and when contextual
features are available. We also provide experimental findings to support our
theory.
( 2
min )
We consider the problem of synthetically generating data that can closely
resemble human decisions made in the context of an interactive human-AI system
like a computer game. We propose a novel algorithm that can generate synthetic,
human-like, decision making data while starting from a very small set of
decision making data collected from humans. Our proposed algorithm integrates
the concept of reward shaping with an imitation learning algorithm to generate
the synthetic data. We have validated our synthetic data generation technique
by using the synthetically generated data as a surrogate for human interaction
data to solve three sequential decision making tasks of increasing complexity
within a small computer game-like setup. Different empirical and statistical
analyses of our results show that the synthetically generated data can
substitute the human data and perform the game-playing tasks almost
indistinguishably, with very low divergence, from a human performing the same
tasks.
( 2
min )
Deep neural networks (DNNs) have been shown to be vulnerable to adversarial
examples. Moreover, the transferability of the adversarial examples has
received broad attention in recent years, which means that adversarial examples
crafted by a surrogate model can also attack unknown models. This phenomenon
gave birth to the transfer-based adversarial attacks, which aim to improve the
transferability of the generated adversarial examples. In this paper, we
propose to improve the transferability of adversarial examples in the
transfer-based attack via masking unimportant parameters (MUP). The key idea in
MUP is to refine the pretrained surrogate models to boost the transfer-based
attack. Based on this idea, a Taylor expansion-based metric is used to evaluate
the parameter importance score and the unimportant parameters are masked during
the generation of adversarial examples. This process is simple, yet can be
naturally combined with various existing gradient-based optimizers for
generating adversarial examples, thus further improving the transferability of
the generated adversarial examples. Extensive experiments are conducted to
validate the effectiveness of the proposed MUP-based methods.
( 2
min )
submitted by /u/davidbun
[link] [comments]
( 47
min )
submitted by /u/davidmezzetti
[link] [comments]
( 43
min )
submitted by /u/Daviewayne
[link] [comments]
( 42
min )
submitted by /u/TheExtimate
[link] [comments]
( 44
min )
submitted by /u/yescatbug
[link] [comments]
( 50
min )
submitted by /u/colabDog
[link] [comments]
( 44
min )
submitted by /u/spenny972
[link] [comments]
( 47
min )
submitted by /u/Artem_Bayankin
[link] [comments]
( 41
min )
AI Weirdness: the strange side of machine learning
( 2
min )
This paper studies the problem of online performance optimization of
constrained closed-loop control systems, where both the objective and the
constraints are unknown black-box functions affected by exogenous time-varying
contextual disturbances. A primal-dual contextual Bayesian optimization
algorithm is proposed that achieves sublinear cumulative regret with respect to
the dynamic optimal solution under certain regularity conditions. Furthermore,
the algorithm achieves zero time-average constraint violation, ensuring that
the average value of the constraint function satisfies the desired constraint.
The method is applied to both sampled instances from Gaussian processes and a
continuous stirred tank reactor parameter tuning problem; simulation results
show that the method simultaneously provides close-to-optimal performance and
maintains constraint feasibility on average. This contrasts current
state-of-the-art methods, which either suffer from large cumulative regret or
severe constraint violations for the case studies presented.
( 2
min )
Deploying deep learning models in real-world certified systems requires the
ability to provide confidence estimates that accurately reflect their
uncertainty. In this paper, we demonstrate the use of the conformal prediction
framework to construct reliable and trustworthy predictors for detecting
railway signals. Our approach is based on a novel dataset that includes images
taken from the perspective of a train operator and state-of-the-art object
detectors. We test several conformal approaches and introduce a new method
based on conformal risk control. Our findings demonstrate the potential of the
conformal prediction framework to evaluate model performance and provide
practical guidance for achieving formally guaranteed uncertainty bounds.
( 2
min )
This paper clarifies why bias cannot be completely mitigated in Machine
Learning (ML) and proposes an end-to-end methodology to translate the ethical
principle of justice and fairness into the practice of ML development as an
ongoing agreement with stakeholders. The pro-ethical iterative process
presented in the paper aims to challenge asymmetric power dynamics in the
fairness decision making within ML design and support ML development teams to
identify, mitigate and monitor bias at each step of ML systems development. The
process also provides guidance on how to explain the always imperfect
trade-offs in terms of bias to users.
( 2
min )
In this paper, we consider the problem of learning a neural network
controller for a system required to satisfy a Signal Temporal Logic (STL)
specification. We exploit STL quantitative semantics to define a notion of
robust satisfaction. Guaranteeing the correctness of a neural network
controller, i.e., ensuring the satisfaction of the specification by the
controlled system, is a difficult problem that received a lot of attention
recently. We provide a general procedure to construct a set of trainable High
Order Control Barrier Functions (HOCBFs) enforcing the satisfaction of formulas
in a fragment of STL. We use the BarrierNet, implemented by a differentiable
Quadratic Program (dQP) with HOCBF constraints, as the last layer of the neural
network controller, to guarantee the satisfaction of the STL formulas. We train
the HOCBFs together with other neural network parameters to further improve the
robustness of the controller. Simulation results demonstrate that our approach
ensures satisfaction and outperforms existing algorithms.
( 2
min )
Over the past decade, neural network (NN)-based controllers have demonstrated
remarkable efficacy in a variety of decision-making tasks. However, their
black-box nature and the risk of unexpected behaviors and surprising results
pose a challenge to their deployment in real-world systems with strong
guarantees of correctness and safety. We address these limitations by
investigating the transformation of NN-based controllers into equivalent soft
decision tree (SDT)-based controllers and its impact on verifiability.
Differently from previous approaches, we focus on discrete-output NN
controllers including rectified linear unit (ReLU) activation functions as well
as argmax operations. We then devise an exact but cost-effective transformation
algorithm, in that it can automatically prune redundant branches. We evaluate
our approach using two benchmarks from the OpenAI Gym environment. Our results
indicate that the SDT transformation can benefit formal verification, showing
runtime improvements of up to 21x and 2x for MountainCar-v0 and CartPole-v0,
respectively.
( 2
min )
submitted by /u/v1ll3_m
[link] [comments]
( 49
min )
submitted by /u/urqlite
[link] [comments]
( 42
min )
submitted by /u/Otarih
[link] [comments]
( 42
min )
submitted by /u/BEEFDATHIRD
[link] [comments]
( 43
min )
The GeForce RTX 4070 GPU, the latest in the 40 Series lineup, is available today starting at $599. It comes backed by NVIDIA Studio technologies, including hardware acceleration for 3D, video and AI workflows; optimizations for RTX hardware in over 110 popular creative apps; and exclusive NVIDIA Studio apps like Omniverse, Broadcast, Canvas and RTX Remix.
( 9
min )
A new adventure with publisher Bandai Namco Europe kicks off this GFN Thursday. Some of its popular titles lead seven new games joining the cloud this week. Plus, gamers can play them on more devices than ever, with native 4K streaming for GeForce NOW available on select LG Smart TVs. Better Together Bandai Namco is Read article >
( 6
min )
The seeds of a machine learning (ML) paradigm shift have existed for decades, but with the ready availability of scalable compute capacity, a massive proliferation of data, and the rapid advancement of ML technologies, customers across industries are transforming their businesses. Just recently, generative AI applications like ChatGPT have captured widespread attention and imagination. We […]
( 15
min )
Amazon CodeWhisperer is an AI coding companion that helps improve developer productivity by generating code recommendations based on their comments in natural language and code in the integrated development environment (IDE). CodeWhisperer accelerates completion of coding tasks by reducing context-switches between the IDE and documentation or developer forums. With real-time code recommendations from CodeWhisperer, you […]
( 6
min )
Over the past few years, large knowledge bases have been constructed to store
massive amounts of knowledge. However, these knowledge bases are highly
incomplete, for example, over 70% of people in Freebase have no known place of
birth. To solve this problem, we propose a query-driven knowledge base
completion system with multimodal fusion of unstructured and structured
information. To effectively fuse unstructured information from the Web and
structured information in knowledge bases to achieve good performance, our
system builds multimodal knowledge graphs based on question answering and rule
inference. We propose a multimodal path fusion algorithm to rank candidate
answers based on different paths in the multimodal knowledge graphs, achieving
much better performance than question answering, rule inference and a baseline
fusion algorithm. To improve system efficiency, query-driven techniques are
utilized to reduce the runtime of our system, providing fast responses to user
queries. Extensive experiments have been conducted to demonstrate the
effectiveness and efficiency of our system.
( 2
min )
Foundation models have taken over natural language processing and image
generation domains due to the flexibility of prompting. With the recent
introduction of the Segment Anything Model (SAM), this prompt-driven paradigm
has entered image segmentation with a hitherto unexplored abundance of
capabilities. The purpose of this paper is to conduct an initial evaluation of
the out-of-the-box zero-shot capabilities of SAM for medical image
segmentation, by evaluating its performance on an abdominal CT organ
segmentation task, via point or bounding box based prompting. We show that SAM
generalizes well to CT data, making it a potential catalyst for the advancement
of semi-automatic segmentation tools for clinicians. We believe that this
foundation model, while not reaching state-of-the-art segmentation performance
in our investigations, can serve as a highly potent starting point for further
adaptations of such models to the intricacies of the medical domain. Keywords:
medical image segmentation, SAM, foundation models, zero-shot learning
( 2
min )
Brain-inspired hyperdimensional computing (HDC) has been recently considered
a promising learning approach for resource-constrained devices. However,
existing approaches use static encoders that are never updated during the
learning process. Consequently, it requires a very high dimensionality to
achieve adequate accuracy, severely lowering the encoding and training
efficiency. In this paper, we propose DistHD, a novel dynamic encoding
technique for HDC adaptive learning that effectively identifies and regenerates
dimensions that mislead the classification and compromise the learning quality.
Our proposed algorithm DistHD successfully accelerates the learning process and
achieves the desired accuracy with considerably lower dimensionality.
( 2
min )
A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random
variables (the vertices); a Bayesian Network Distribution (BND) is a
probability distribution on the random variables that is Markovian on the
graph. A finite $k$-mixture of such models is graphically represented by a
larger graph which has an additional "hidden" (or "latent") random variable
$U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other
vertex. Models of this type are fundamental to causal inference, where $U$
models an unobserved confounding effect of multiple populations, obscuring the
causal relationships in the observable DAG. By solving the mixture problem and
recovering the joint probability distribution on $U$, traditionally
unidentifiable causal relationships become identifiable. Using a reduction to
the more well-studied "product" case on empty graphs, we give the first
algorithm to learn mixtures of non-empty DAGs.
( 2
min )
Deep feedforward networks initialized along the edge of chaos exhibit
exponentially superior training ability as quantified by maximum trainable
depth. In this work, we explore the effect of saturation of the tanh activation
function along the edge of chaos. In particular, we determine the line of
uniformity in phase space along which the post-activation distribution has
maximum entropy. This line intersects the edge of chaos, and indicates the
regime beyond which saturation of the activation function begins to impede
training efficiency. Our results suggest that initialization along the edge of
chaos is a necessary but not sufficient condition for optimal trainability.
( 2
min )
Statistical optimality benchmarking is crucial for analyzing and designing
time series classification (TSC) algorithms. This study proposes to benchmark
the optimality of TSC algorithms in distinguishing diffusion processes by the
likelihood ratio test (LRT). The LRT is an optimal classifier by the
Neyman-Pearson lemma. The LRT benchmarks are computationally efficient because
the LRT does not need training, and the diffusion processes can be efficiently
simulated and are flexible to reflect the specific features of real-world
applications. We demonstrate the benchmarking with three widely-used TSC
algorithms: random forest, ResNet, and ROCKET. These algorithms can achieve the
LRT optimality for univariate time series and multivariate Gaussian processes.
However, these model-agnostic algorithms are suboptimal in classifying
high-dimensional nonlinear multivariate time series. Additionally, the LRT
benchmark provides tools to analyze the dependence of classification accuracy
on the time length, dimension, temporal sampling frequency, and randomness of
the time series.
( 2
min )
The convergence rates for convex and non-convex optimization methods depend
on the choice of a host of constants, including step sizes, Lyapunov function
constants and momentum constants. In this work we propose the use of factorial
powers as a flexible tool for defining constants that appear in convergence
proofs. We list a number of remarkable properties that these sequences enjoy,
and show how they can be applied to convergence proofs to simplify or improve
the convergence rates of the momentum method, accelerated gradient and the
stochastic variance reduced method (SVRG).
( 2
min )
submitted by /u/rowancheung
[link] [comments]
( 43
min )
Experts convene to peek under the hood of AI-generated code, language, and images as well as its capabilities, limitations, and future impact.
( 11
min )
Martin Luther King Jr. Scholar Brian Nord trains machines to explore the cosmos and fights for equity in research.
( 9
min )
This is a guest post co-written with Moulham Zahabi from Matarat. Probably everyone has checked their baggage when flying, and waited anxiously for their bags to appear at the carousel. Successful and timely delivery of your bags depends on a massive infrastructure called the baggage handling system (BHS). This infrastructure is one of the key […]
( 13
min )
This is a guest post by Carter Huffman, CTO and Co-founder at Modulate. Modulate is a Boston-based startup on a mission to build richer, safer, more inclusive online gaming experiences for everyone. We’re a team of world-class audio experts, gamers, allies, and futurists who are eager to build a better online world and make voice […]
( 7
min )
Globally, many organizations have critical business data dispersed among various content repositories, making it difficult to access this information in a streamlined and cohesive manner. Creating a unified and secure search experience is a significant challenge for organizations because each repository contains a wide range of document formats and access control mechanisms. Amazon Kendra is […]
( 10
min )
This is a guest blog post co-written with Hussain Jagirdar from Games24x7. Games24x7 is one of India’s most valuable multi-game platforms and entertains over 100 million gamers across various skill games. With “Science of Gaming” as their core philosophy, they have enabled a vision of end-to-end informatics around game dynamics, game platforms, and players by […]
( 11
min )
Creating a map requires masterful geographical knowledge, artistic skill and evolving technologies that have taken people from using hand-drawn sketches to satellite imagery. Just as important, changes need to be navigated in the way people consume maps, from paper charts to GPS navigation and interactive online charts. The way people think about video games is Read article >
( 6
min )
Imagine a stroller that can drive itself, help users up hills, brake on slopes and provide alerts of potential hazards. That’s what GlüxKind has done with Ella, an award-winning smart stroller that uses the NVIDIA Jetson edge AI and robotics platform to power its AI features. Kevin Huang and Anne Hunger are the co-founders of Read article >
( 5
min )
Deep classifier neural networks enter the terminal phase of training (TPT)
when training error reaches zero and tend to exhibit intriguing Neural Collapse
(NC) properties. Neural collapse essentially represents a state at which the
within-class variability of final hidden layer outputs is infinitesimally small
and their class means form a simplex equiangular tight frame. This simplifies
the last layer behaviour to that of a nearest-class center decision rule.
Despite the simplicity of this state, the dynamics and implications of reaching
it are yet to be fully understood. In this work, we review the principles which
aid in modelling neural collapse, followed by the implications of this state on
generalization and transfer learning capabilities of neural networks. Finally,
we conclude by discussing potential avenues and directions for future research.
( 2
min )
Understanding decisions made by neural networks is key for the deployment of
intelligent systems in real world applications. However, the opaque decision
making process of these systems is a disadvantage where interpretability is
essential. Many feature-based explanation techniques have been introduced over
the last few years in the field of machine learning to better understand
decisions made by neural networks and have become an important component to
verify their reasoning capabilities. However, existing methods do not allow
statements to be made about the uncertainty regarding a feature's relevance for
the prediction. In this paper, we introduce Monte Carlo Relevance Propagation
(MCRP) for feature relevance uncertainty estimation. A simple but powerful
method based on Monte Carlo estimation of the feature relevance distribution to
compute feature relevance uncertainty scores that allow a deeper understanding
of a neural network's perception and reasoning.
( 2
min )
Object detection is a crucial task in computer vision that aims to identify
and localize objects in images or videos. The recent advancements in deep
learning and Convolutional Neural Networks (CNNs) have significantly improved
the performance of object detection techniques. This paper presents a
comprehensive study of object detection techniques in unconstrained
environments, including various challenges, datasets, and state-of-the-art
approaches. Additionally, we present a comparative analysis of the methods and
highlight their strengths and weaknesses. Finally, we provide some future
research directions to further improve object detection in unconstrained
environments.
( 2
min )
Deep machine learning models including Convolutional Neural Networks (CNN)
have been successful in the detection of Mild Cognitive Impairment (MCI) using
medical images, questionnaires, and videos. This paper proposes a novel
Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to
distinguish MCI from those with normal cognition by analyzing facial features.
The data comes from the I-CONECT, a behavioral intervention trial aimed at
improving cognitive function by providing frequent video chats. MC-ViViT
extracts spatiotemporal features of videos in one branch and augments
representations by the MC module. The I-CONECT dataset is challenging as the
dataset is imbalanced containing Hard-Easy and Positive-Negative samples, which
impedes the performance of MC-ViViT. We propose a loss function for Hard-Easy
and Positive-Negative Samples (HP Loss) by combining Focal loss and AD-CORRE
loss to address the imbalanced problem. Our experimental results on the
I-CONECT dataset show the great potential of MC-ViViT in predicting MCI with a
high accuracy of 90.63\% accuracy on some of the interview videos.
( 2
min )
Recently, large language models (LLMs) like ChatGPT have demonstrated
remarkable performance across a variety of natural language processing tasks.
However, their effectiveness in the financial domain, specifically in
predicting stock market movements, remains to be explored. In this paper, we
conduct an extensive zero-shot analysis of ChatGPT's capabilities in multimodal
stock movement prediction, on three tweets and historical stock price datasets.
Our findings indicate that ChatGPT is a "Wall Street Neophyte" with limited
success in predicting stock movements, as it underperforms not only
state-of-the-art methods but also traditional methods like linear regression
using price features. Despite the potential of Chain-of-Thought prompting
strategies and the inclusion of tweets, ChatGPT's performance remains subpar.
Furthermore, we observe limitations in its explainability and stability,
suggesting the need for more specialized training or fine-tuning. This research
provides insights into ChatGPT's capabilities and serves as a foundation for
future work aimed at improving financial market analysis and prediction by
leveraging social media sentiment and historical stock data.
( 2
min )
We study a game between autobidding algorithms that compete in an online
advertising platform. Each autobidder is tasked with maximizing its
advertiser's total value over multiple rounds of a repeated auction, subject to
budget and/or return-on-investment constraints. We propose a gradient-based
learning algorithm that is guaranteed to satisfy all constraints and achieves
vanishing individual regret. Our algorithm uses only bandit feedback and can be
used with the first- or second-price auction, as well as with any
"intermediate" auction format. Our main result is that when these autobidders
play against each other, the resulting expected liquid welfare over all rounds
is at least half of the expected optimal liquid welfare achieved by any
allocation. This holds whether or not the bidding dynamics converges to an
equilibrium and regardless of the correlation structure between advertiser
valuations.
( 2
min )
The paper presents a modular approach for the estimation of a leading
vehicle's velocity based on a non-intrusive stereo camera where SiamMask is
used for leading vehicle tracking, Kernel Density estimate (KDE) is used to
smooth the distance prediction from a disparity map, and LightGBM is used for
leading vehicle velocity estimation.
Our approach yields an RMSE of 0.416 which outperforms the baseline RMSE of
0.582 for the SUBARU Image Recognition Challenge
( 2
min )
Despite the vast body of literature on Active Learning (AL), there is no
comprehensive and open benchmark allowing for efficient and simple comparison
of proposed samplers. Additionally, the variability in experimental settings
across the literature makes it difficult to choose a sampling strategy, which
is critical due to the one-off nature of AL experiments. To address those
limitations, we introduce OpenAL, a flexible and open-source framework to
easily run and compare sampling AL strategies on a collection of realistic
tasks. The proposed benchmark is augmented with interpretability metrics and
statistical analysis methods to understand when and why some samplers
outperform others. Last but not least, practitioners can easily extend the
benchmark by submitting their own AL samplers.
( 2
min )
We developed a prototype device for dynamic gaze and accommodation
measurements based on 4 Purkinje reflections (PR) suitable for use in AR and
ophthalmology applications. PR1&2 and PR3&4 are used for accurate gaze and
accommodation measurements, respectively. Our eye model was developed in ZEMAX
and matches the experiments well. Our model predicts the accommodation from 4
diopters to 1 diopter with better than 0.25D accuracy. We performed
repeatability tests and obtained accurate gaze and accommodation estimations
from subjects. We are generating a large synthetic data set using physically
accurate models and machine learning.
( 2
min )
The consumption of microbial-contaminated food and water is responsible for
the deaths of millions of people annually. Smartphone-based microscopy systems
are portable, low-cost, and more accessible alternatives for the detection of
Giardia and Cryptosporidium than traditional brightfield microscopes. However,
the images from smartphone microscopes are noisier and require manual cyst
identification by trained technicians, usually unavailable in resource-limited
settings. Automatic detection of (oo)cysts using deep-learning-based object
detection could offer a solution for this limitation. We evaluate the
performance of three state-of-the-art object detectors to detect (oo)cysts of
Giardia and Cryptosporidium on a custom dataset that includes both smartphone
and brightfield microscopic images from vegetable samples. Faster RCNN,
RetinaNet, and you only look once (YOLOv8s) deep-learning models were employed
to explore their efficacy and limitations. Our results show that while the
deep-learning models perform better with the brightfield microscopy image
dataset than the smartphone microscopy image dataset, the smartphone microscopy
predictions are still comparable to the prediction performance of non-experts.
( 2
min )
Deep learning based approaches like Physics-informed neural networks (PINNs)
and DeepONets have shown promise on solving PDE constrained optimization
(PDECO) problems. However, existing methods are insufficient to handle those
PDE constraints that have a complicated or nonlinear dependency on optimization
targets. In this paper, we present a novel bi-level optimization framework to
resolve the challenge by decoupling the optimization of the targets and
constraints. For the inner loop optimization, we adopt PINNs to solve the PDE
constraints only. For the outer loop, we design a novel method by using
Broyden's method based on the Implicit Function Theorem (IFT), which is
efficient and accurate for approximating hypergradients. We further present
theoretical explanations and error analysis of the hypergradients computation.
Extensive experiments on multiple large-scale and nonlinear PDE constrained
optimization problems demonstrate that our method achieves state-of-the-art
results compared with strong baselines.
( 2
min )
This paper introduces a novel representation of convolutional Neural Networks
(CNNs) in terms of 2-D dynamical systems. To this end, the usual description of
convolutional layers with convolution kernels, i.e., the impulse responses of
linear filters, is realized in state space as a linear time-invariant 2-D
system. The overall convolutional Neural Network composed of convolutional
layers and nonlinear activation functions is then viewed as a 2-D version of a
Lur'e system, i.e., a linear dynamical system interconnected with static
nonlinear components. One benefit of this 2-D Lur'e system perspective on CNNs
is that we can use robust control theory much more efficiently for Lipschitz
constant estimation than previously possible.
( 2
min )
Artificial neural networks are promising for general function approximation
but challenging to train on non-independent or non-identically distributed data
due to catastrophic forgetting. The experience replay buffer, a standard
component in deep reinforcement learning, is often used to reduce forgetting
and improve sample efficiency by storing experiences in a large buffer and
using them for training later. However, a large replay buffer results in a
heavy memory burden, especially for onboard and edge devices with limited
memory capacities. We propose memory-efficient reinforcement learning
algorithms based on the deep Q-network algorithm to alleviate this problem. Our
algorithms reduce forgetting and maintain high sample efficiency by
consolidating knowledge from the target Q-network to the current Q-network.
Compared to baseline methods, our algorithms achieve comparable or better
performance in both feature-based and image-based tasks while easing the burden
of large experience replay buffers.
( 2
min )
Recently proposed BERT-based evaluation metrics for text generation perform
well on standard benchmarks but are vulnerable to adversarial attacks, e.g.,
relating to information correctness. We argue that this stems (in part) from
the fact that they are models of semantic similarity. In contrast, we develop
evaluation metrics based on Natural Language Inference (NLI), which we deem a
more appropriate modeling. We design a preference-based adversarial attack
framework and show that our NLI based metrics are much more robust to the
attacks than the recent BERT-based metrics. On standard benchmarks, our NLI
based metrics outperform existing summarization metrics, but perform below SOTA
MT metrics. However, when combining existing metrics with our NLI metrics, we
obtain both higher adversarial robustness (15%-30%) and higher quality metrics
as measured on standard benchmarks (+5% to 30%).
( 2
min )
In the past few years, more and more AI applications have been applied to
edge devices. However, models trained by data scientists with machine learning
frameworks, such as PyTorch or TensorFlow, can not be seamlessly executed on
edge. In this paper, we develop an end-to-end code generator parsing a
pre-trained model to C source libraries for the backend using MicroTVM, a
machine learning compiler framework extension addressing inference on bare
metal devices. An analysis shows that specific compute-intensive operators can
be easily offloaded to the dedicated accelerator with a Universal Modular
Accelerator (UMA) interface, while others are processed in the CPU cores. By
using the automatically generated ahead-of-time C runtime, we conduct a hand
gesture recognition experiment on an ARM Cortex M4F core.
( 2
min )
These lecture notes provide an overview of Neural Network architectures from
a mathematical point of view. Especially, Machine Learning with Neural Networks
is seen as an optimization problem. Covered are an introduction to Neural
Networks and the following architectures: Feedforward Neural Network,
Convolutional Neural Network, ResNet, and Recurrent Neural Network.
( 2
min )
Classic online prediction algorithms, such as Hedge, are inherently unfair by
design, as they try to play the most rewarding arm as many times as possible
while ignoring the sub-optimal arms to achieve sublinear regret. In this paper,
we consider a fair online prediction problem in the adversarial setting with
hard lower bounds on the rate of accrual of rewards for all arms. By combining
elementary queueing theory with online learning, we propose a new online
prediction policy, called BanditQ, that achieves the target rate constraints
while achieving a regret of $O(T^{3/4})$ in the full-information setting. The
design and analysis of BanditQ involve a novel use of the potential function
method and are of independent interest.
( 2
min )
Geometric deep learning enables the encoding of physical symmetries in
modeling 3D objects. Despite rapid progress in encoding 3D symmetries into
Graph Neural Networks (GNNs), a comprehensive evaluation of the expressiveness
of these networks through a local-to-global analysis lacks today. In this
paper, we propose a local hierarchy of 3D isomorphism to evaluate the
expressive power of equivariant GNNs and investigate the process of
representing global geometric information from local patches. Our work leads to
two crucial modules for designing expressive and efficient geometric GNNs;
namely local substructure encoding (LSE) and frame transition encoding (FTE).
To demonstrate the applicability of our theory, we propose LEFTNet which
effectively implements these modules and achieves state-of-the-art performance
on both scalar-valued and vector-valued molecular property prediction tasks. We
further point out the design space for future developments of equivariant graph
neural networks. Our codes are available at
\url{https://github.com/yuanqidu/LeftNet}.
( 2
min )
Dynamic spectrum access systems typically require information about the
spectrum occupancy and thus the presence of other users in order to make a
spectrum al-location decision for a new device. Simple methods of spectrum
occupancy detection are often far from reliable, hence spectrum occupancy
detection algorithms supported by machine learning or artificial intelligence
are often and successfully used. To protect the privacy of user data and to
reduce the amount of control data, an interesting approach is to use federated
machine learning. This paper compares two approaches to system design using
federated machine learning: with and without a central node.
( 2
min )
Breast cancer is one of the most common and dangerous cancers in women, while
it can also afflict men. Breast cancer treatment and detection are greatly
aided by the use of histopathological images since they contain sufficient
phenotypic data. A Deep Neural Network (DNN) is commonly employed to improve
accuracy and breast cancer detection. In our research, we have analyzed
pre-trained deep transfer learning models such as ResNet50, ResNet101, VGG16,
and VGG19 for detecting breast cancer using the 2453 histopathology images
dataset. Images in the dataset were separated into two categories: those with
invasive ductal carcinoma (IDC) and those without IDC. After analyzing the
transfer learning model, we found that ResNet50 outperformed other models,
achieving accuracy rates of 90.2%, Area under Curve (AUC) rates of 90.0%,
recall rates of 94.7%, and a marginal loss of 3.5%.
( 2
min )
In the automotive industry, the full cycle of managing in-use vehicle quality
issues can take weeks to investigate. The process involves isolating root
causes, defining and implementing appropriate treatments, and refining
treatments if needed. The main pain-point is the lack of a systematic method to
identify causal relationships, evaluate treatment effectiveness, and direct the
next actionable treatment if the current treatment was deemed ineffective. This
paper will show how we leverage causal Machine Learning (ML) to speed up such
processes. A real-word data set collected from on-road vehicles will be used to
demonstrate the proposed framework. Open challenges for vehicle quality
applications will also be discussed.
( 2
min )
We present a deep-learning based approach for measuring small planetary
radial velocities in the presence of stellar variability. We use neural
networks to reduce stellar RV jitter in three years of HARPS-N sun-as-a-star
spectra. We develop and compare dimensionality-reduction and data splitting
methods, as well as various neural network architectures including single line
CNNs, an ensemble of single line CNNs, and a multi-line CNN. We inject
planet-like RVs into the spectra and use the network to recover them. We find
that the multi-line CNN is able to recover planets with 0.2 m/s semi-amplitude,
50 day period, with 8.8% error in the amplitude and 0.7% in the period. This
approach shows promise for mitigating stellar RV variability and enabling the
detection of small planetary RVs with unprecedented precision.
( 2
min )
Object pose estimation is a critical task in robotics for precise object
manipulation. However, current techniques heavily rely on a reference 3D
object, limiting their generalizability and making it expensive to expand to
new object categories. Direct pose predictions also provide limited information
for robotic grasping without referencing the 3D model. Keypoint-based methods
offer intrinsic descriptiveness without relying on an exact 3D model, but they
may lack consistency and accuracy. To address these challenges, this paper
proposes ShapeShift, a superquadric-based framework for object pose estimation
that predicts the object's pose relative to a primitive shape which is fitted
to the object. The proposed framework offers intrinsic descriptiveness and the
ability to generalize to arbitrary geometric shapes beyond the training set.
( 2
min )
Deep feedforward networks initialized along the edge of chaos exhibit
exponentially superior training ability as quantified by maximum trainable
depth. In this work, we explore the effect of saturation of the tanh activation
function along the edge of chaos. In particular, we determine the line of
uniformity in phase space along which the post-activation distribution has
maximum entropy. This line intersects the edge of chaos, and indicates the
regime beyond which saturation of the activation function begins to impede
training efficiency. Our results suggest that initialization along the edge of
chaos is a necessary but not sufficient condition for optimal trainability.
( 2
min )
Although neural networks (especially deep neural networks) have achieved
\textit{better-than-human} performance in many fields, their real-world
deployment is still questionable due to the lack of awareness about the
limitation in their knowledge. To incorporate such awareness in the machine
learning model, prediction with reject option (also known as selective
classification or classification with abstention) has been proposed in
literature. In this paper, we present a systematic review of the prediction
with the reject option in the context of various neural networks. To the best
of our knowledge, this is the first study focusing on this aspect of neural
networks. Moreover, we discuss different novel loss functions related to the
reject option and post-training processing (if any) of network output for
generating suitable measurements for knowledge awareness of the model. Finally,
we address the application of the rejection option in reducing the prediction
time for the real-time problems and present a comprehensive summary of the
techniques related to the reject option in the context of extensive variety of
neural networks. Our code is available on GitHub:
\url{https://github.com/MehediHasanTutul/Reject_option}
( 2
min )
Epilepsy is the most common neurological disorder and an accurate forecast of
seizures would help to overcome the patient's uncertainty and helplessness. In
this contribution, we present and discuss a novel methodology for the
classification of intracranial electroencephalography (iEEG) for seizure
prediction. Contrary to previous approaches, we categorically refrain from an
extraction of hand-crafted features and use a convolutional neural network
(CNN) topology instead for both the determination of suitable signal
characteristics and the binary classification of preictal and interictal
segments. Three different models have been evaluated on public datasets with
long-term recordings from four dogs and three patients. Overall, our findings
demonstrate the general applicability. In this work we discuss the strengths
and limitations of our methodology.
( 2
min )
Understanding decisions made by neural networks is key for the deployment of
intelligent systems in real world applications. However, the opaque decision
making process of these systems is a disadvantage where interpretability is
essential. Many feature-based explanation techniques have been introduced over
the last few years in the field of machine learning to better understand
decisions made by neural networks and have become an important component to
verify their reasoning capabilities. However, existing methods do not allow
statements to be made about the uncertainty regarding a feature's relevance for
the prediction. In this paper, we introduce Monte Carlo Relevance Propagation
(MCRP) for feature relevance uncertainty estimation. A simple but powerful
method based on Monte Carlo estimation of the feature relevance distribution to
compute feature relevance uncertainty scores that allow a deeper understanding
of a neural network's perception and reasoning.
( 2
min )
We propose a hierarchical tensor-network approach for approximating
high-dimensional probability density via empirical distribution. This leverages
randomized singular value decomposition (SVD) techniques and involves solving
linear equations for tensor cores in this tensor network. The complexity of the
resulting algorithm scales linearly in the dimension of the high-dimensional
density. An analysis of estimation error demonstrates the effectiveness of this
method through several numerical experiments.
( 2
min )
We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization
method for deep networks that has exhibited performance improvements on image
and language prediction problems. We show that when SAM is applied with a
convex quadratic objective, for most random initializations it converges to a
cycle that oscillates between either side of the minimum in the direction with
the largest curvature, and we provide bounds on the rate of convergence.
In the non-quadratic case, we show that such oscillations effectively perform
gradient descent, with a smaller step-size, on the spectral norm of the
Hessian. In such cases, SAM's update may be regarded as a third derivative --
the derivative of the Hessian in the leading eigenvector direction -- that
encourages drift toward wider minima.
( 2
min )
The recipe behind the success of deep learning has been the combination of
neural networks and gradient-based optimization. Understanding the behavior of
gradient descent however, and particularly its instability, has lagged behind
its empirical success. To add to the theoretical tools available to study
gradient descent we propose the principal flow (PF), a continuous time flow
that approximates gradient descent dynamics. To our knowledge, the PF is the
only continuous flow that captures the divergent and oscillatory behaviors of
gradient descent, including escaping local minima and saddle points. Through
its dependence on the eigendecomposition of the Hessian the PF sheds light on
the recently observed edge of stability phenomena in deep learning. Using our
new understanding of instability we propose a learning rate adaptation method
which enables us to control the trade-off between training stability and test
set evaluation performance.
( 2
min )
Financial services, the gig economy, telco, healthcare, social networking, and other customers use face verification during online onboarding, step-up authentication, age-based access restriction, and bot detection. These customers verify user identity by matching the user’s face in a selfie captured by a device camera with a government-issued identity card photo or preestablished profile photo. They […]
( 10
min )
Developing web interfaces to interact with a machine learning (ML) model is a tedious task. With Streamlit, developing demo applications for your ML solution is easy. Streamlit is an open-source Python library that makes it easy to create and share web apps for ML and data science. As a data scientist, you may want to […]
( 7
min )
Enterprise customers have multiple lines of businesses (LOBs) and groups and teams within them. These customers need to balance governance, security, and compliance against the need for machine learning (ML) teams to quickly access their data science environments in a secure manner. These enterprise customers that are starting to adopt AWS, expanding their footprint on […]
( 11
min )
Amazon SageMaker Studio can help you build, train, debug, deploy, and monitor your models and manage your machine learning (ML) workflows. Amazon SageMaker Pipelines enables you to build a secure, scalable, and flexible MLOps platform within Studio. In this post, we explain how to run PySpark processing jobs within a pipeline. This enables anyone that […]
( 9
min )
RStudio on Amazon SageMaker is the first fully managed cloud-based Posit Workbench (formerly known as RStudio Workbench). RStudio on Amazon SageMaker removes the need for you to manage the underlying Posit Workbench infrastructure, so your teams can concentrate on producing value for your business. You can quickly launch the familiar RStudio integrated development environment (IDE) […]
( 10
min )
Announcements Redefining “No-Code” Development Platforms I recently watched a video from Blizzard Entertainment Game Director Wyatt Cheng on ChatGPT’s ability to create a simple video game from scratch. While the art assets were not created by ChatGPT, the AI program Midjourney created the program using rough sketches and text prompts. Cheng created this challenge for… Read More »DSC Weekly 11 April 2023 – Redefining “No-Code” Development Platforms
The post DSC Weekly 11 April 2023 – Redefining “No-Code” Development Platforms appeared first on Data Science Central.
( 19
min )
Modern IT companies widely use virtualization due to advantages such as scalability, rational consumption of resources, and convenient backup. This article explains how Policy-Based Data Protection, a feature in NAKIVO Backup & Replication software, works, makes managing VM data protection more accessible, and outlines its benefits. What Is Policy-Based Data Protection? Policy-Based Data Protection is… Read More »VM Data Protection: Automate VM Backup and Replication in a Few Clicks
The post VM Data Protection: Automate VM Backup and Replication in a Few Clicks appeared first on Data Science Central.
( 28
min )
The digital landscape today is rapidly evolving, and businesses now face an unprecedented array of cyber threats putting sensitive data, financial assets, and even their reputation at risk.
The post Machine Learning and AI: The Future of SIEM Alternatives in Cybersecurity appeared first on Data Science Central.
( 21
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )